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The Prediction of Proficiency of Taxicab Drivers 


Clarence W. Brown and Edwin E. Ghiselli 


University of California, Berkeley 


In the evaluation of devices for use in the 
selection of operators of public conveyances, 
greatest attention has been given to the cri- 
terion of accidents. In some instances labor 
turnover has been considered, but safety of 
performance has been given greater emphasis. 
While the importance of accidents and labor 
turnover certainly is not to be minimized, 
it should be apparent that the success of 
operators of vehicles can be measured in 
other important ways. In the taxicab indus- 
try, for example, job success can be gauged 
in terms of the dollar volume of business that 
the driver achieves. The economic health of 
a taxicab company can be improved by reduc- 
ing costs due to accidents and personnel re- 
placements, but it is more directly related to 
the monetary return accomplished from the 
sale of its services. It is apparent, therefore, 
that the selection of individuals who can sell 
their services as taxicab drivers is worthy of 
consideration. 

Very little information is available on the 
effectiveness of predicting the productivity of 
taxicab drivers. Wechsler reports inconsistent 
correlations between sales and intelligence test 
scores (4). Viteles found intelligence tests 
to be of no value but obtained substantial pre- 
dictions from a weighted personal data blank 
(3). The results of investigations conducted 
to date provide little help in planning an ex- 
perimental program for driver selection. 


Criteria 
The amount of business conducted by a 
taxicab company is subject to a number of 
uncontrolled variables. It is affected by such 


obvious factors as weather and season. But 
in addition, sales are sensitive to other types 


of occurrences such as large civic entertaia- 
ments, conventions, the payment of bonuses 
by some large local organization, and the like. 
In many instances sales rise or fall for no 
discernible reason. Since thése variations 
may be as great as 100‘% it is apparent that 
corrections must be applied to a driver's sales 
in order to compensate for the time trends 
in the volume of business. In the present in- 
vestigation the average sales for all drivers 
were computed for each week, and the pro- 
ductivity of each driver was expressed as a 
percentage of this average. This procedure 
controlled most but not all of the time trends. 
These weekly indices formed the basis of the 
production criteria employed in the valida- 
tion studies reported here. 

One characteristic of the taxicab industry 
which is pertinent to all selection studies is 
the high rate of turnover among the drivers. 
A person who has been with a company for a 
year is considered to be an “old hand.” This 
high labor turnover means that production 
records for any extensive periods of employ- 
ment are not obtainable for large numbers of 
drivers working under relatively homogeneous 
conditions. In the present investigation pro- 
duction during the first eighteen weeks of em- 
ployment was used. In spite of the fact that 
this period of time is relatively short, the re- 
liability of the measures of proficiency was 
quite satisfactory. The coefficient of correla- 
tion between production indices on odd and 
even weeks, corrected by the Spearman-Brown 
formula, was found to be .96. 

A cross validation study of the tests was 
conducted in a second and smaller company. 
Due to the particular accounting methods of 
this company, sales records were not avail- 
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able in a usable form. The manager of the 
company, however, provided ratings of his 
drivers’ productivity on a six-point scale. 
In making the ratings the manager discussed 
each driver with the investigators, thus care- 
fully reviewing the driver’s achievement be- 
fore placing him in one of the rating cate- 
gories. While no evidence of reliability was 
obtained, in terms of distribution statistics, at 
least, the ratings were satisfactory. The rat- 
ings were made on the men after three months 
of employment. 


Subjects 


The subjects in the present investigation 
were men who applied for work as taxicab 
drivers and who were hired. Only those cases 
were used who had no previous experience in 
driving taxicabs. They did vary, however, 
in the amount of experience they had had in 
driving other types of commercial vehicles. 

Various selective factors operated so that 
the subjects used were by no means repre- 
sentative of the entire range of talent of ap- 
plicants. Prior to being hired the men were 
interviewed and took a driver road test. 
About 20% were rejected on these bases. In 


addition, about another 20% were rejected 
on the basis of very poor scores on the apti- 


tude tests to be described here. As a final 
selective factor, only those cases were used 
who remained on the job either 18 or 12 weeks 
or more. In the two companies used in the 
present investigation, approximately 40% of 
the drivers left their jobs within the 18 or 12 
week periods. The men utilized in this study, 
then, represent about 20% of the applicants; 
those who survived the hiring procedures and 
remained on the job at least 18 weeks for the 
larger company and 12 weeks for the smaller 
company. For the basic validation study, 54 
men were drawn from the first company, and 
for the cross validation study 29 men were 
drawn from the second company. 


Predictor Variables 


Seven aptitude tests were utilized together 
with an interest inventory. All of the tests 
were time limited, and all were of the paper 
and pencil variety. As indicated earlier, the 
tests and the inventory were administered 
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prior to hiring. The choice of the particular 
measures utilized was dictated by an interest 
in predicting several aspects of success rather 
than concentrating on sales alone. Certain of 
the measures were found to predict accidents 
and labor turnover (1, 2). 

An arithmetic test was employed which in- 
volved problems in making change and com- 
puting fares. A test, termed Speed of Reac- 
tions, presented the individual with a series 
of rules that he was to use in making differ- 
ential responses to various spatial arrange- 
ments and organizations of letters. Some in- 
dication of motor speed and precision was 
obtained from dotting and tapping tests. 
The dotting test called for the placing of a 
single dot in each of a series of irregularly 
spaced circles. In the tapping test only 
speed was required, the individual tapping as 
rapidly as possible with his pencil, placing 
three dots in each of a series of circles. 

Two tests of spatial ability were adminis- 
tered which primarily involved the ability to 
detect differences in distances. In the Judg- 
ment of Distance test each item was a schema- 
tized table top on which rested four cubes of 
equal size. On the basis of perspective and 
interposition the individual judged which 
cubes were nearest together. The Distance 
Discrimination test called for the discrimina- 
tion of linear distances between points. A 
Mechanical Principles test was used which 
consisted of a series of pictorially presented 
problems each of which required knowledge 
of some simple principle of mechanics. 

In the interest inventory each item involved 
a pair of occupations or jobs, and the individ- 
ual chose the one of each pair which he pre- 
ferred. The choices were between a higher 
and a lower occupation, a job performed out- 
side as compared with one performed inside, 
a job involving dealing with people rather 
than one not requiring such activity, and a job 
involving moving about rather than one re- 
quiring sedentary activity. 


Results 


Table 1 gives the validity coefficients for 
the various predictor variables using the 
sales production criterion for the basic group 
of 54 drivers. With the possible exception of 
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Table 1 
Validity Coefficients of Several Tests for Predicting 
Sales Production of 54 Taxicab Drivers 


Validity 
Coefficient 


Test 
Arithmetic 29 
Speed of Reaction —.19 
Dotting 21 
Tapping 18 
Judgment of Distance — .03 
Distance Discrimination 24 
Mechanical Principles A3 
Interest Inventory .20 


the arithmetic test, none of the predictors 
alone would be considered to give adequate 
prediction. When the extent of restriction in 
range of talent is considered, however, low 
coefficients assume some importance. With 
the exception of the Judgment of Distance 
test, all measures would seem to merit further 
study. 

A simple combination of test scores was 
effected by eliminating the Judgment of Dis- 
tance test, assigning unit weight to each of 
the others, and assigning a negative value to 
the Speed of Reaction test. In effect, this 
composite score was the sum of the standard 
scores of the individual tests. The validity 
of this battery score for the 54 basic cases 
was .39, which is a reasonably satisfactory 
prediction. 

As is well known, there is almost always a 
shrinkage in validity coefficients in cross vali- 
dation studies. The test weights mentioned 
above were used in validating the scores of the 
29 cases in the second company. In this cross 


3. Viteles, M.S. Industrial psychology 


439 


validation the validity of the battery was 
found to be .29. While this value may not 
appear to be particularly significant it is to be 
remembered that in addition to restriction of 
range of talent this coefficient is affected by 
the use of a somewhat different criterion. 

In view of the complex nature of the pro- 
duction criterion, it is surprising that such 
tests as dotting, tapping, and discrimination 
of distances have any predictive power at all. 
No logic would lead an investigator to em- 
ploy such tests in predicting sales of taxicab 
service. To be sure, the extent of prediction 
by individual tests was low, but the com- 
bination of tests gave a usable index of 
aptitude. 


Summary 


Seven tests and an interest inventory were 
administered to 54 taxicab drivers and vali- 
dated against their sales. With one possible 
exception, no single test gave adequate pre- 
diction. A simple weighted combination of 
the tests yielded a validity of 39. When the 


weighted battery was applied to another group 
of 29 drivers it was found to have a validity 
of .29 in the prediction of ratings of job 


proficiency. 
Received March 16, 1953 
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Some Measured Characteristics of Air Force Weather Fore- 
casters and Success in Forecasting 


James J. Jenkins 


University of Minnesota 


A review of the psychological and meteoro- 
logical literature reveals that little is known 
about the measured psychological character- 
istics of weather forecasters, and this writer 
has found no studies relating such character- 
istics to occupational success. In the past it 
appears that high scholastic ability or achieve- 
ment has been accepted as essential. Selec- 
tion practice in the AAF Technical Training 
Command during World War IT stressed high 
scores on tests of academic ability, mathe- 
matics, and physics (e.g. 10, 11, 12) since the 
Weather Forecasting course was believed to 
be one of the most difficult courses offered in 
the technical training schools. Success in the 
course showed low positive correlations with 
the AGCT and mathematics tests (e.g. 8, 9). 
Harrell (2) in a survey of AGCT scores of 
209 AAF technical specialties found the en- 
listed weather forecasters to be the highest 
ranking group with a median score of 136.7. 
This perhaps indicates only that the screen- 
ing on intelligence was very effective. 

The purpose of the present study was: (1) 
to determine how Air Force forecasters are 
differentiated from a more general population ; 
and (2) to disclose the extent to which cer- 
tain measures are associated with ability to 
forecast weather. 


Procedure 


In 1948 the writer secured the cooperation 
of the Air Weather Service for a study of 
some of the psychological characteristics of 
forecasters and the possible relation of these 


! This study was made possible by the cooperation 
of the Air Weather Service, U. S. Air Force. It was 
undertaken with the encouragement of General D. N. 
Yates, then Chief of the Air Weather Service. The 
writer is especially indebted to Prof. Donald G. 
Paterson for his assistance and guidance in every 
phase of the study. This paper is part of the writer's 
Ph.D. thesis on file in the library of the University 
of Minnesota under the title of “Prediction of fore- 
casting efficiency for Army weather 
1950. 


forecasters,” 
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characteristics to success in forecasting. A 
study of available job descriptions and lists 
of qualifications (e.g. 13, 14) and a job analy- 
sis from these sources and the writer’s own 
experience as a forecaster resulted in the selec- 
tion of the following variables for considera- 
tion as related to success: education, college 
major, mathematics background, forecasting 
and observing experience, kind of meteoro- 
logical training, forecasting aids most fre- 
quently used, speed and accuracy of percep- 
tion, spatial relations ability, general academic “ 
ability, and vocational interests. Information 
on all but the last four of these was gathered 
by means of a questionnaire. The remaining 
variables were measured by the Minnesota 
Clerical Test, the Revised Minnesota Paper 
Form Board, the Ohio State University Psy- 
chological Test, and the Strong Vocational In- 
terest Blank for Men. ‘ihe tests were admin- 
istered to the forecasters by the Air Weather 
Service and the results returned to the writer. 


The Criterion 


The problem of obtaining criterion data had 
been encountered by the Air Weather Service 
early in World War TI. Muller (5) in a re- 
view of the literature on verification of fore- 
casts points out that no less than 54 methods 
of evaluation were proposed between 1893 
and 1943 and that all of these have been 
vigorously criticised. After a long program 
of experimentation by the Weather Informa- 
tion Branch, a special verification method was 
devised by Lt. M. J. Slonim (15) which 
seemed to avoid most of the usual difficulties. 
This procedure. consisted of evaluating the 
probabilities of occurrence of given values for 
each forecast element (pressure, temperature, 
precipitation, visibility, and ceiling) from 
climatological data for the time and location 
being forecast. A scale of 30 equal prob- 
ability units (or trentiles) was set up by 
which observed and forecast values could be 
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compared. Discrepancies were summed in 
probability units to indicate the relative ac- 
curacy of forecasts. For example, to score 
pressure forecasts for a given station at a 
given time, we would proceed as follows: (1) 
gather past observations for this period of the 
year for this locality and make a frequency 
distribution of observed pressures; (2) divide 
this frequency distribution into 30 intervals 
of equal or nearly equal probability of oc- 
currence (see Table 1 for a partial example) ; 
and (3) score forecasts now obtained in terms 
of the number of equal probability intervals 
(or trentiles) in the discrepancy between the 
forecast and observed values. (In our ex- 
ample, if a pressure of 995 is observed and a 
pressure of 1000.8 is forecast, the score is 
zero. If the forecast is 1003.0 the score is 
one. If the forecast is 1033.0 the score is 
thirty, etc.) 

Each Air Force forecaster in the United 
States was required to make at least three 
forecasts per week for five widely scattered 


Table 1 


A Hypothetical Frequency Distribution of Barometric 
Pressures and the Resulting Trentile Table 
for Station “X” for a Given 
Thirty-Day Period 


Trentile Table 


Pressure Distribution 


Number of 
Observa- 
tions 


Pressure 
in Milli- 
bars 


Values of 
Element 


Trentiles 


998.7 1 
999.4 1 
1000.3 1 
1000.8 1 
1001.2 1 


Less than and 
| including 1001.2 


1001.7 1 
1003.1 1 
1003.7 1 
1003.9 3 


From 1001.3 to 
| 1003.9 inclusive 


1032.2 2 
1033.5 1 
1033.9 2 


} From 1031.8 to 
any greater value 30 


Note: The middle portion of this table is omitted 
because of excessive length. 
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stations selected by the Weather Information 
Branch. All forecasts were made from the 
1230 Greenwich time maps. The data avail- 
able to the forecasters were approximately 
the same regardless of their location in the 
countrv. 

This program ran from 1943 to 1945 and 
furnished the criterion data for this ‘“post- 
diction” study. The criterion yielded a re- 
liability of .90 (estimated from a part-whole 
correlation of .70 between 8 weeks of the pro- 
gram and the total 84-week program). It is 
unfortunate that these data are now available 
only in terms of standard scores so that the 
relative accuracy of the forecasts in terms of 
initial values is unknown. The homogeneity 
or heterogeneity of the forecasters as a group 
is impossible to assay even in terms of prob- 
ability deviations. It is also regrettable that 
the conditions under which the forecasts were 
made (time pressure, amount of other work, 
freedom from interference, etc.) could not 
possibly be equated. The validity evidence is 
largely “face validity” (15). 


The Sample 


The sample for this study was sharply 
restricted by three conditions. First, the fore- 
caster must have participated in the criterion 
study. Second, he must have remained in the 
Air Weather Service until 1948. Third, in 
1948 he must have been stationed in or near 
the United States so that he could be tested. 
Only 92 forecasters met all these conditions 
and constituted the sample for this study. 
The forecasting scores of the sample were 
compared to those of the total group par- 
ticipating in the verification program (N 
= 2023) and were found to resemble them 
closely (y* = 2,832; P = .88). 

Characteristics of the Sample 

General. All but two of the forecasters 
graduated from high school, but only 30 had 
graduated from college. Three had Ph.D. de- 
grees and 12 others had done post-graduate 
work. Average education was 14.3 years with 
a standard deviation of 2.2 years. A total of 
57 indicated college majors and of these 49 
were in the natural sciences or mathematics. 
Before starting meteorology training the aver- 
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age number of college mathematics courses 
was three. The range was from no courses 
to two Ph.D.’s in mathematics. All of the 
sample had received training in meteorology 
in military schools under contract with the 
Air Forces. Of the sample, 71 per cent had 
previous experience as weather observers. 

Test Data. Means and standard deviations 
in raw scores for the ability tests are given in 
Table 2. Reference to relevant norm groups 
reveals the forecasters to be a highly selected 
group on all of these variables. 


Table 2 
Means and Standard Deviations of Forecasters 
on Ability Tests 





Standard 
Deviation 


Tests Mean 


Minnesota Clerical Test 
Numbers 
Names 
Revised Minnesota Paper 
Form Board 
Ohio State University Psy- 
chological Exam. 
Part I 
Part IT 
Part III 


141.1 
145.8 


50.1 


Total 


The mean scores on the Minnesota Clerical 
Test fall at the 95th and 93rd percentiles for 
Numbers and Names sections respectively 
when compared to gainfully occupied adults 
and at the 60th and 73rd percentiles when 
compared to employed clerical workers them- 
selves (1). 

The mean score on the revised Minnesota 
Paper Form Board when compared to the 
norms of various male industrial groups (3) 
falls from the 80th to the 97th percentile with 
a median value at the 90th percentile. Even 
compared to first and fifth year engineering 
students the percentile ranks are 80 and 70 
respectively. 

For the Ohio State University Psychological 
Test the forecasters were compared with col- 
lege freshmen as a group which roughly ap- 
proximated the pre-army educational status 
of the sample. On this basis the mean for 


James J. Jenkins 


Table 3 


Means, Standard Deviations, and Percentage of A and 
B+ Ratings for Each Strong Key for Total 
Sample of 92 Weather Forecasters 


Stand 
ard 
Devia- 
tion 


10.0 


Group 


Occupation Mean 


19.8 
19.4 12.4 
28.3 10.5 
28.7 9.7 
32.2 9.7 
27.9 99 


I Artist 
Psychologist 
Architect 
Physician 
Osteopath 
Dentist 


25.5 99 
22.7 13.8 
40.8 10.1 
38.2 11.5 


42.6 8.2 


40.3 9.2 
43.2 10.2 
31.7 
39.7 
43.3 
36.9 
35.1 


Mathematician 
Physicist 
Engineer 
Chemist 


Production Manager 


Farmer 

Aviator 

Carpenter 

Printer 

Math. Phys. Sci. T. 
Policeman 

Forest Service Man 


YMCA Phys, Director 
Personnel Director 
Public Admin. 
YMCA Secretary 
Soc. Sci. H. S. T. 
City School Supt. 
Minister 


30.5 
37.0 
43.3 
22.7 
31.1 
23.4 
18.5 


Musician 28.5 


C, P.-A. 26.0 


36.0 
37.8 
34.2 
27.5 
27.0 
28.1 


31.2 
22.1 


Accountant 

Office Man 
Purchasing Agent 
Banker 

Mortician 


Sales Manager 
Real Est. Sales. 
Life Insur. Sales. 


26.8 
25.8 
25.6 


29.8 


Advertising Man 
Lawyer 
Author-Journalist 
Pres.-Mig. Concern 


55.2 
52.6 
54.6 


Interest Maturity 
Occupational Level 
Masc.-Fem. 
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the forecasters falls at the 87th percentile in 
Part I (Same-opposites), 81st in Part II 
(Word relationships), 90th in Part III (Read- 
ing comprehension), and 87th for the total 
score (7). 

It is readily apparent that the forecasters 
are a superior group on all three of these 
tests. While this superiority might be ex- 
pected on the Ohio in view of the initial 
screening, their superiority on the other two 
tests is not readily explained. 

The means and standard deviations for 
each of the Strong Vocational Interest Blank 
keys are given in Table 3 in terms of the oc- 
cupational standard scores. The high mean 
scores of the meteorologists (B+ and B) 
show their interests to be similar to those 
of persons in the occupations of Engineer, 
Chemist, Production Manager, Farmer, Avia- 
tor, Printer, Mathematics-Physical Science 
Teacher, Policeman, Forest Service Man, Per- 
sonnel Director, Public Administrator, Ac- 
countant, Office Man, and Purchasing Agent. 
Their interests are most markedly dissimilar 
(scores in the C area) to those persons in the 
occupations of Actor, Banker, Mortician, Real 
Estate Salesman, Life Insurance Salesman, 
Advertising Man, Lawyer and Author-Jour- 
nalist. Most of the rest of the scores were in 
or near the chance range. 

If one views these occupations in terms of 
Strong’s factor analysis data (6), the simi- 
larity of this grouping of the occupations to 
Factor III (“Language” or “things versus 
people”) is immediately apparent. All of the 
occupations in which the meteorologists score 
high have positive loadings on this factor (in 
the direction of “‘non-language” and “things”’) 
and all the occupations in which they score 
low have negative loadings (in the direction 
of “language” and “people’’). 

The picture of the forecasters seen in the 
interest test results is one of a technical, 
skilled-trades interest group with little verbal- 
linguistic, pure science, or social service in- 
terest. The relatively low OL score received 
by the group seems to reflect the technical 
skilled-trades kind of picture already given. 
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In order that the findings might be sub- 
jected to cross-validation the sample was split 
in half. Individuals were paired on criterion 
scores and for each pair a random determina- 
tion was made as to which member fell in the 
first group and which into the second group. 
A double cross-validation technique (4) was 
used, prediction devices being prepared on 
each group and validated on the other group. 
Regression, cutting scores, and profile tech- 
niques were utilized. The final results of 
this procedure are summarized here rather 
than presenting the work in detail. 

No consistent differences were found be- 
tween the better and poorer forecasters with 
respect to age, rank, education, college major, 
mathematics background, forecasting and ob- 
serving experience, kind of training, forecast- 
ing aids most frequently used, interest test 
profiles or scores on the Revised Minnesota 
Paper Form Board. The Ohio State Univer- 
sity Psychological Test proved to be of little 
use in discriminating degrees of success in 
forecasting but of the 27 persons who scored 
low (below 44) on Part III (reading com- 
prehension), 19 of them fell in the lower half 
of the forecasting group. High scorers were 
not, however, distinguished from other fore- 
casters. 

The Numbers section of the Minnesota 
Clerical Test proved to be of no value in the 
prediction of forecasting accuracy but the 
Names section proved to be of considerable 
value. In both halves of the sample it cor- 
related + .31 with the criterion. When used 
alone with a cutting score it consistently 
eliminated at least twice as many cases from 
the lower half of the criterion group as from 
the upper half. When used in profile rela- 
tionship with the Numbers section of the test 
and the Revised Minnesota Paper Form 
Board, it eliminated 35 per cent of the cases 
in the lower half and only 4 per cent of the 
cases in the upper half. (This amounts to a 
crude use of the other two tests as suppressor 
variables. They correlate positively with the 
Names section and essentially zero with the 
criterion. ) 
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Discussion 


In view of the findings of this research it 
would seem that the role of speed and ac- 
curacy in perception as measured by some 
component of the Names section of the Min- 
nesota Clerical Test should be investigated 
carefully in future studies of weather fore- 
caster training and success on the job. The 
high level of clerical ability found in the 
forecasters as a group seems to argue that 
some kind of selection on this variable is 
already taking place, and the correlation with 
forecast verification seems to indicate that 
this is an important though not an a priori 
obvious source of variation in on-the-job 
performance. 

It should be noted, however, that the search 
for predictors cannot be considered at all 
complete. Both of the tests which proved to 
have any predictive efficiency in this study 
functioned only at the lower score levels to 
provide negative selection. A study of other 


abilities and the motivational and personality 
characteristics of those individuals who were 
high on all of the tests employed here but 
still relatively low in forecasting accuracy is 


obviously necessary. 


Summary 


A sample of 92 Air Force Weather Fore- 
casters was studied to determine: (1) how the 
sample was differentiated from a more gen- 
eral population; and (2) the extent to which 
biographical data and psychological meas- 
ures were associated with ability to forecast. 
Subjects completed a questionnaire and four 
standard psychological tests. The sample 
proved to be similar to the World War II 
population of forecasters with respect to fore- 
casting ability as measured by the Short- 
Range Forecast Verification Program. 

The sample proved to be a highly select 
group with respect to educational background, 
clerical ability, spatial relations ability, and 
general academic ability. With respect to in- 
terests the forecasters appear to resemble a 
technical, skilled-trades interest group with 
little verbal-linguistic, pure science, or social 
service interests. 


Jenkins 


A double cross-validation study to predict 
forecasting accuracy revealed only one con- 
sistent predictor, the Names section of the 
Minnesota Clerical Test, which correlated 
+ .31 with skill in forecasting 

It is suggested that further studies of the 
role of perceptual skills and personality varia- 
bles in weather forecasting are needed. 
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A Note on Small Samples 


Edward N. Hay 
Edward N. Hay & Associates, Inc., Philadelphia, Pa 


The sample presents more delicate prob- 
lems than almost any other aspect of meas- 
urement. To begin with, many psychologists 
working on problems of testing, have been in 
the habit of going through many refined 
operations to correct for the deficiencies of a 
“small sample.”’ Small sample methods were 
developed in agronomy, where it is possible 
to hold most of the variables reasonably con- 
stant. This is much less true in testing hu- 
man beings. Consequently, the mere applica- 
tion of small sample statistics cannot be 
expected to produce automatically more valid 
results than would be the case without them. 
Sometimes the characteristics of a sample 
are such that no amount of treatment will 
bring about a satisfactory result. 

The ceaseless search for “large samples” 
has resulted in many errors. A psychologist 
not long ago, in the course of an industry 
study, published norms which were the re- 
sult of adding tozether a great many small 
samples. It was not possible to make any 
check on the soundness of this operation be- 
cause of lack of information. However, large 
samples have frequently been made out of 
groups of small samples, when the charac- 
teristics of individual small samples differed 
widely. In one instance, the mean of one 
sample was more than one sigma away from 
the mean of another sample. The reasons 
for the difference could only be conjectured 
but certainly one larger sample was not to be 


had by adding two small samples which dif- 
fered this much. 

Some years ago; I violated my own prin- 
ciples by assembling a group of 120 subjects 
from more than 20 departments of a single 
company. The resulting “hash” actually pro- 
duced reasonably satisfactory validity co- 
efficients just by luck. On another occasion, 
there were three departments of 10, 7, and 24 
employees respectively. Before adding them 
together to make a “large” sample, an ex- 
amination of the characteristics of all three 
groups was made. This revealed that there 
was little variability in the test scores for the 
two smaller groups. In the circumstances, 
neither one could produce a validity coeffi- 
cient or contribute to one when added to other 
samples. The larger group gave r’s exceed- 
ing .5 on a number of tests. 

Bransford has developed procedures for 
handling the criterion measures so as to be 
able to combine samples which are unlike in 
some degree; for example, ratings made by 
different raters. He spoke on this topic be- 
fore the Eastern Psychological Association at 
Atlantic City in 1952 under the descriptive 
title “Summational Within-Group Analysis.” 
Combining criterion measures presents more 
difficulties usually than combining the scores 
of tests of the same groups. 


Received September 17, 1953. 
Early publication. 
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The Measurement of Personality and Behavior Disorders by the 
I. P. A. T. Music Preference Test 


Raymond B. Cattell and Jean C. Anderson 


Laboratory of Personality Assessment and Group Behavior, University of Illinois 


On the wide research front which is roughly 
designated by “projective tests,” but perhaps 
more accurately by “misperception tests” 
(1), few recent advances have been so promis- 
ing as that connected with music perception. 
The powerful and immediate connection of 
musical stimulation with emotional experi- 
ence, and the many indications that uncon- 
scious needs gain satisfaction through this 
medium, have long pointed to measures of 
musical preference as effective avenues to 
deeper aspects of personality. Moreover, the 
lack of verbal content is itself, on general 
principles, a promise that the verbal, cogni- 
tive defenses of the censor may be by-passed 
and the emotional needs probed more directly 
without distortion by defense elaborations. 


The Music Preference Test 


Personality tests which proceed from the 
esthetic reactions of the subject, or from lik- 
ings and dislikings which cannot be based on 
logical, explicit relationships to the subject’s 
purposes and sentiments, occupy an area in- 
termediate between that of projective tests 
and that of other objective personality tests. 
For the liking or disliking is evidently due to 
characteristics imported or projected into the 
physical sounds by the listener, yet the “pro- 
jections” are not so explicit as in the imagery 
evoked by the Rorschach or the interpretive 
stories which the subject is asked to weave 
around the T.A.T. It is possible, therefore, 
that further research and clinical experience 
with this relatively unexplored class of tests 
(which may be called tests of “affective mis- 
perception”) will show them to have certain 
advantages over the standard projective or 
misperception tests. For sophisticated sub- 
jects intuitively realize that their cognitive 
projections stand in need of defensive dis- 
guise, whereas their likings and dislikings 
make no more sense to them than they do to 
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the psychologist—before his statistical analy- 
ses are made. 

As in all test construction involving “items,” 
it would be foolish here to design psycho- 
logical measures hinging on the luck of a 
single response and to attempt to relate such 
a single response to personality dimensions. 
Instead we first seek reliability in the test 
measurement itself by composing it of scores 
on several items, thereby diminishing the 
effects of chance and specific historical as- 
sociations. This may be conceived as dis- 
covering the dozen or more “items” that can 
be validly added together to give a score on 
some single dimension of emotional quality 
or musical-emotional reactivity. Attempts to 
find these groupings by introspection or by 
psychiatric judgments must be set aside, for 
they are shown by preliminary research to be 
highly unreliable and to constitute an ama- 
teurish approach to the problem. Instead it 
is necessary to find the dimensions of musical 
choice by submitting a number of musical 
excerpts to a large population and correlat- 
ing the responses, thereby discovering em- 
pirically which responses “go together.” This 
first stage of research in the area has already 
been carried out by Cattell and Saunders (4) 
using 120 half-minute musical excerpts under 
conditions described elsewhere. 

The psychologically interesting and re- 
assuring thing about this factor analysis of 
a matrix couched in a new variety of response 
correlations, namely, in music preference re- 
sponses, is that simple structure was as 
definitely obtained here as with ability tests, 
and that a comparison of two factorizations 
revealed a very gratifying degree of invari- 
ance of the factors. With this assurance from 
an initial study it is to be hoped that psy- 
chologists will be encouraged to face the vast 
amount of exacting work required by this ap- 
proach instead of being beguiled by merely 
esthetic intuitions in test construction. 
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The two dovetailed factor analyses yielded 
eleven stable factors (4). But before these 
basic findings could become a_ practical 
foundation upon which further “applied” re- 
search could readily go forward, in clinics 
and guidance centers generally, it was first 
necessary to construct out of the above re- 
search findings a convenient routine instru- 
ment. This was done under the auspices of 
the Institute for Personality and Ability Test- 
ing by the senior author and has issued in a 
12-inch long-playing record, reproducing 100 
half-minute music excerpts (50 on one side, 
Form A, and an equivalent 50 on the other, 
Form B). Except for the first and the last 
three factors in this test there are ten items 
provided to measure each factor. These items 
were chosen from the 120 factorized, accord- 
ing to the usual test construction principles: 
a significant loading on the factor concerned; 
a balancing (suppression) of loadings on fac- 
tors not concerned; a balancing of “like” and 
“dislike’’ responses in the score for any one 
factor; no use of any item for more than one 
factor. A cyclical order of sampling of items 
from the various factors is used in the test as 
finally presented. 

The test so constructed, when cross vali- 
dated on a new population, was found to have 
consistency (split-half reliability) and equiva- 
lence (Form A vs. Form B) reliability co- 
efficients (2) that were adequate on only 
seven or eight of the eleven factors. See Table 
1. This inadequacy arises largely from some 
factors being measured on a bare minimum 
of 3 or 4 items in one form. Accordingly it 
is advocated that only seven or eight inde- 
pendent factors be routinely measured in 
standard clinical use and that the remaining 
three or four measures serve an exploratory 
purpose, as “located nuclei” from which fur- 
ther research can, by extension into new items, 
build up better factor scales. 

Meanwhile the test has been initially stand- 
ardized for every factor on a normal popula- 
tion of 380 student and non-student adults 
ranging from 18 to 68 years of age. The in- 
structions, which are given in standard form 
by the voice on the record, are set out below. 
The I.P.A.T. Music Preference Test of Per- 
sonality (3) is thus normally presented simply 
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as a test of musical preferences, but the im- 
plication that we were psychologically inter- 
ested in the results from the standpoint of 
personality measurement was realized, at least 
by the normal group, in this particular ex- 
periment. 


First Issues Needing Research 


Now that such a measuring instrument is 
available, a number of researches immediately 
suggest themselves, especially in applied psy- 
chology. Concerning its promise as a per- 
sonality test it is at once apparent from in- 
spection of the actual musical excerpts found 
to be highly loaded in the various factors, 
that these factors are not merely cultur- 
ally-determined groupings, corresponding to 
musical “schools” or periods (with one possi- 
ble exception among the eleven factors: F 1). 
With this superficial interpretation rejected 
we may next examine the hypothesis that 
these factors correspond to what have been 
called major “hidden premises” in ‘the logic 
of personal preference (1). For these hidden 
premises of choice decision, according to our 
hypothesis as stated elsewhere (1), should be 
temperamental and early-environment-deter- 
mined dimensions of personality itself. 

If this is correct, there should be some sub- 
stantial correlations between these factors and 
the factors on the 16 Personality Factor Ques- 
tionnaire or any other measure of the primary 
personality factors. This at least is the 
hypothesis upon which the whole of the pres- 
ent investigation has been carried forward. 
If the musical choices are determined by per- 
sonality factors, ie., by emotional needs and 
constitutional tempers, we should expect, 
further, that various neurotic and psychotic 
syndromes, which are themselves explicable 
in terms of combinations of personality fac- 
tors, and sometimes in terms of single per- 
sonality factors, should show correlations 
with the musical choices. The immediately 
needed investigations, therefore, seem to be: 
(1) a study correlating the music factors 
with primary personality factors, in a normal 
group; and (2) a comparison of psychotics 
and normals in terms of musical preference 
factor profiles. 

The hypotheses that the music factors cor- 
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respond to needs or to temperamental factors 
can be tested by this design, but one should 
also recognize that a third possibility exists 
—namely that the discovered music factors 
represent affective mood states, temporary 
dynamic stimulus conditions, physiological in- 
fluences, etc. This alternative, however, need 
not be investigated unless the present search 
for stable personality associates proves abor- 
tive. Some “function fluctuation” associated 
with mood will almost certainly -exist and it 
will attenuate our correlations. But if our 
hypothesis is correct that the major associa- 
tions will be found in relation to relatively 
stable personality structures, then it could 
seem better to track down this residual, “fluc- 
tuation” variance later. At that point not 
only the associations of the music factors with 
mood, but also the individual tendencies to 
high or low fluctuation on the music factors 
will bring in relationships of further impor- 
tance for understanding musical preference 
and personality. 

A fourth design of research which is also 
immediately needed is a factorization of a 
population of psychotics, to see whether the 
structure of factors is the same there as in a 
normal group. Unless there is some fairly 
close resemblance of the factor structure in 
the two groups, it would indeed be illogical 
to measure psychotics on the same dimensions 
as those found among normals. Accordingly, 
we have also gathered data for factorization 
of the same 120 excerpts on a population of 
100 psychotics, and this will be intercorre- 
lated and factorized if statistical man-hour 
resources can be provided by the Music Re- 
search Foundation. 

The general reaction of cultivated listeners 
to the above propositions has been that our 
hypotheses neglect the role of intellectual and 
cognitive functions in musical appreciation. 
Our argument is that these functions are not 
primary but are only means to *nds—tech- 
nical rationalizations of the aesthe ., perhaps 
changing superficially with cultural ciimate— 
for satisfactions which are deeper and more 
stable. Initial experimental support for our 
position is given by the fact that the music fac- 
tors do not apparently correspond in content to 
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cultural or technical dimensions. A research 
designed to tackle this question more posi- 
tively has meanwhile been set in motion. It 
consists of an experiment in which fifty 
choices in pictorial art, thirty choices in 
architecture, and forty choices in sculpture 
are intercorrelated and also correlated with 
the factors in musical choice. If the same 
factorial dimensions appear here, aligning 
themselves with the music factors, and cutting 
across periods and cultural integrations, there 
will be additional evidence that we are pro- 
ceeding beyond technical, cultural or his- 
torical patterns. 


Design of the Experiment 


The first part of our investigation, that 
with normal subjects, called for the adminis- 
tration of the Musical Preference Test to a 
normal population which should be : (1) well 


varied in personality; and (2) simultaneously 
measured on a sufficiently reliable and valid 
measure of the primary personality factors. 
The main contribution to the test population 
consisted of 102 male and female subjects, 
76 of whom were University of Illinois stu- 


dents, ages 18 to 29, and 26 of whom were 
“general adults,” ages 30 to 81. The re- 
mainder were tested in a second sub-group 
consisting of 55 students, both male and fe- 
male, ages 17 to 28. Since we needed to 
apply a personality test which deals with 
primary and independent personality dimen- 
sions of known associations we employed the 
I.P.A.T. 16 Personality Factor Questionnaire, 
which is also convenient for group adminis- 
tration with reasonably literate populations. 
The 16 P.F. includes intelligence as one di- 
mension. Each of the 157 subjects, there- 
fore, took a one-hour music preference test in 
which both forms A and B of the music test 
were administered, and a_ half-hour silent 
session in which Form A of the 16 P.F. Test 
was administered. The instructions in the 
Music Preference Test are on the beginning of 
the record, and are as follows: 


“This is a test of your likings and dislikings in 
music. Your score has nothing to do with how 
much you agree or disagree with popular tastes, 
but only with how much you agree with yourself; 
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that is, with how consistent ! you are. So try to 
say, as each piece is played, whether you your- 
self like it; whether it is pleasant, so that you 
would like to hear more of it, or whether you 
would just as soon have it switched off. 

“On the score sheet before you are numbers 
for the fifty pieces that will be played, each for 
less than half a minute. As each comes to an 
end, underline L, I, or D, opposite that number, 
indicating you like it, or have an intermediate, 
indifferent reaction, or dislike it. Dislike does 
not mean that you hate it, but only that you 
don’t particularly like that kind of music. In 
fact you should aim to have just as many D’s as 
L’s underlined when you get to the end. Try 
not to use I for intermediate more than you need. 
In fact, you should expect to end up with very 
roughly one-third L’s. one-third I’s and one-third 
D’s. But don’t bother about that too much. 
Just give your reactions as truthfully as pos- 
ele sea 


The administration of the Music Preference 
Test to a group of psychotics took place at 
Kankakee State Hospital, Kankakee, Illinois. 
In this case the subjects were taken in small 
groups of three or four at a time, in order that 
it might be ascertained that they were appro- 
priately responding on the answer sheets to 
every piece of music. It is well known that 


diagnoses in different mental hospitals do not 


agree very highly (as shown on the individual 
cases transferred from hospital to hospital), 
and that the very proportions of manic-de- 
pressives, schizophrenics, hysterics, and other 
psychotic syndromes, as diagnosed in different 
institutions, may vary considerably. As usual 
a good deal of difficulty was experie \ced in 
obtaining a sufficient sample of some psychia- 
tric syndrome groups. In accepting the group 
divisions finally used the criterion for classi- 
fication was naturally the hospital diagnosis 
as reached in case conferences. A total group 
of 98 psychotic patients was obtained consist- 
ing of 36 alcoholics, 22 schizophrenics of 
mixed types, 10 manics, 7 paranoids, and 23 
of other categories each not sufficient in num- 
ber for separate use in our study. The sub- 
jects were both male and female, the age 
range being approximately 25 to 60 years. 
1This obviously asks the person to be “true to 
himself’ and to give his considered judgment; with 
advanced music students on the other hand it might 
be interpreted as being consistent with regard to 


musical “schools,” but our subjects were not music 
students. 
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Results for the Normal Personalities 


The findings for the normal group will first 
be described. Our initial interest turns on the 
reliabilities, a minority of which, as men- 
tioned above, were low enough to suggest 
dropping certain factors. These correlations 
are presented first as consistency (split-hailf) 
coefficients in Table 1, Part A and secondly 
as coefficients of equivalence (correlation of 
Form A with Form B) in Part B of Table 1. 

The equivalence coefficients perhaps do not 
do justice to the tests because the highest 
loaded items were in every case put in the 
A form, since, when psychometrists are un- 
able to use the full length test, it is the A 
form that they will use. This reduces the 
equivalences (columns 5 and 6) below the 
consistency coefficients (columns 2 and 4) 
which more truly represent the internal con- 
sistency, and are defective—for a 10-item 
length of scale—only on factors 3, 9 and 10, 
recommended to be dropped. 

The correlations between the sixteen factors 
of the 16 P. F. Test and the eleven factors of 
the Music Preference Test were worked out 
separately for the two populations, as a mu- 
tual check. For economy of representation 
the values in Table 2 are blanks except where 
the correlations on the two samples are of the 
same sign and both beyond the 1% level of 
significance. Then a single value—the mean 
correlation (Fisher’s z)—has been corrected 
for attenuation, by the given reliabilities of 
the Music Preference and 16 P. F. Test meas- 
ures, and recorded in Table 2. 

None of the correlations is large enough to 
demonstrate a one-to-one relation between the 
music factors and the personality factors. 
But the set of 76 P. F. Test factors associated 
with any one music factor has a_ psycho- 
logically consistent and compatible character 
among the members in every case. For ex- 
ample, the personality factors correlating sig- 
nificantly with music factor No. 1 are domi- 
nance, surgency, toughness, radicalism and 
self-sufficiency—all possibly related to some 
second-order, comprehensive factor of tem- 
peramental toughness. Furthermore (and al- 
ternatively) the relative magnitudes of the 
correlations are such as could be compatible 





Raymond B. Cattell and Jean C. Anderson 


Table 1 


Reliability Coefficients for Factor Measurements 





Part A 
Consistency Coefficients 
(Whole Group) 


Half- 
Length 


Items 
Factor Coefficient in Half 


1 (71) 5 
2 (62) 
3 (,06) 





Full Length 
83 


47 





No. of Spearman-Brown 
Corrected to 


.11 (used only 


Part B 
Equivalence Coefficients 
(Form A with Form B) 





Sample 
of 102 
Persons 


dee 5 7 
42 57 
10 ; .24 


Persons 


experimentally) 


(.41) 
(.10) 


59 


02 


18 (used only 1 


experimentally ) 


(.27) 
(41) 
(.46) 
(.09) 


43 
58 
63 


38 
15 
38 


.0O (used only 


experimentally) 


(.14) 


25 (used only 


experimentally ) 


(.37) 


with a one-to-one relationship of music and 
personality factors if chance experimental 
error and the existing specious correlations 
among the factors within both the personality 
and the music area could be eliminated (nota- 


bly by longer scales for each factor and by 
dropping items in one factor scale having any 
correlation with another factor). A test of 
this possible explanation must await much 
further work on the purification of the pres- 


Table 2 


Correlations of Music Preference Factors and Personality Factors 


16 P.F. 
Factors 


5 


Music Preference Factors 


6 7 
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ent factor scales. Meanwhile, however, this 
explanation rests on the indication that the 
highest correlation for a given music factor 
with any personality factor is aiso the highest 
correlation for the personality factor with 
any music factor. For example, factor 2 has 
its highest r with Q4, which r is also Q4’s 
highest r with anything; factor 3’s highest is 
with H, which is also H’s highest; the factor 
4 column has its highest with M, which is also 
the highest in the M row, and so on, with very 
few exceptions (notably factor 8). 

As to the consistency of psychological 
meaning among personality factors associated 
with a given music factor we may mention, in 
addition to factor 1 above, that factor 2 cor- 
relates negatively both with paranoid tend- 
ency and nervous tension, which tendencies 
have been previously found associated by 
Darling (2); and that factor 4, which cor- 
relates essentially with M (‘“Unconvention- 
ality vs. Practical Concernedness”’), also has 
some association with Q2 (“Independent Self- 
sufficiency”). The alternative possibility is 
thus indicated, as suggested above, that where 
a music factor does not align itself with a 
first-order personality factor it may prove on 
further research to correspond to a second- 
order factor uniting the personality factors 
in some underlying common influence. For 
this reason music factor 1 has been called 
“Tough Sociability vs. Tenderminded Indi- 
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viduality,” which contingently restricts the 
meaning pretty closely to the psychological 
bi-polarity of personality factor I, with which 
it is most associated, but also suggests fea- 
tures of the other factors with which it has 
some degree of association. The over-all de- 
scription of the personality dimension as- 
sociated with this particular music factor thus 
becomes remarkably similar to the Tender- 
vs. Tough-minded continuum described by 
William James (6). 


Results for the Abnormal Personalities 


As stated above, in the account of design, 
the test was administered to 98 hospitalized 
psychotics, divided into those four major syn- 
drome groups which had each a sufficient 
number of well-diagnosed cases to promise 
sone sivnificance of differences, if sych should 
exist. 

The means and sigmas on all 11 factors are 
sho n for normals, for abnormals as a whole, 
and for the four abnormal syndrome groups, 
in Table 3. ‘ 

The differences are examined below by the 
t test, first with respect to the differences be- 
tween the main psychotic group and the psy- 
chotic sub-groups, on the one hand, and the 
normal group on the other, with results as 
shown in Table 4. Nothing below a 10% 
probability is recorded in the P column. 


Table 3 





Scores of Normal and Abnormal Groups 





Abnormals 
n = 98 


Mean Sigma 


Normals 
= 369 


Mean Sigma 


n= 36 


Factor 


5.7 45 41 


2.8 11.6 
3.1 10.8 


12.2 
11.4 


Alcoholics 


Mean Sigma Mean 


si 6a 
4.1 8.7 ‘ 64 4A4 
2.7 8.9 j 90 21 
3.3 48 ; 4.1 3.0 


Schizophrenics 
(D-P) 
n = 22 


Manics Paranoids 
n= 10 n=] 


Mean Sigma Mean Sigma 


Sigma 


13.4 3. 158 4.6 124 63 
10.0 2 36 12.1 4.6 
8.6 x 90 14 94 

55 a 8.2 2.6 5.4 
a. 6 i ee 2 10.4 
10.7 M2 22 94 


3.2 
2.6 
3.0 
2.1 


2.1 


7.0 
7.9 
9.2 
5.9 
5.6 


5.8 
7.3 
9.0 
5.5 
6.6 


7.8 
8.4 
94 
6.0 
5.4 


pF 
91 
9.0 
59 
3.6 





1.6 


7.1 
8.4 
8.1 
6.6 
34 
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Table 4 


Significances of Differences of Abnormal Groups from Normals 


Total Abnormal Alcoholics 


t P t P 
(% lev. sig.) (% lev. sig.) 
1.7 5-10 2.0 2-5 
3.9 —1* 5.6 —1 
2.4 1-2 1.5 
6.7 —1 $i 
2.0 2-5 0.2 
7.3 —1 6.2 
4.3 —1 5.8 

Jd 2.0 
8.0 —1 4.8 
1.2 0.4 
1.7 1.0 


Factor 


5-10 
* This indicates “beyond the 1% level.” 


The psychotics differ, beyond the 1% level, 
in being lower on factor 2, lower on 4, higher 
on 6, lower on 7, and higher on 9. These dif- 
ferences similarly characterize the alcoholics, 
who happen to be the largest group, though 
still constituting only 36 out of 98 psychotics. 
The schizophrenics differ at the 1% level only 
by being higher on 6 and 9. The manics have 
similar tendencies on these factors (2—5% 
level), but also come up with a new difference 


Schizophrenics 


Manics Paranoids 


t P t P t P 
(% lev. sig.) (% lev. sig.) (% lev. sig.) 
0.3 1.4 0.5 
0.7 04 0.8 
1.8 1.1 0.2 
33 1.6 0.2 
1.1 0.6 3.6 
eh 2.4 1.0 
0.9 2.0 1.2 
1.0 1,2 0.3 
4,7 2.4 1.5 
0.9 04 1.6 
1.2 3.1 pe 


(1-2%), by being lower on factor 11. The 
paranoids have no resemblance to the alco- 
holic and schizophrenic majority, but share 
the manic’s lower score on 11 and show a new 
pattern in being lower (1% level) on 5. 

Before commenting on these findings let 
us examine, finally, the capacity of the test to 
discriminate among various psychotic syn- 
drome groups themselves. The test examina- 
tion is presented in Table 5. 


Table 5 


Significances of Differences of Syndrome Groups, from Total Psychotic Group and One Anothert 


Psychotics Psychotics 
vs. vs. 

Alcoholics Schizophrenics 

n = 98 & 36 n = 98 & 22 


t P t P 


Factor (% lev. sig.) (% lev. sig.) 


A 1.3 
x i 1.3 
9 K 04 
10 j 0.2 
11 2 0.3 


Manics 
vs. 
Schizoids 
n=10&7 


Psychotics 
Vs. Vs. 
-Manics Paranoids 
n= 98 & 10 n= 98&7 


Psychotics 


t P l P l P 
(% lev. sig.) (% lev. sig.) (% lev. sig.) 
0.8 0.8 1.4 
1.2 1.8 0.6 
0.2 0.7 0.5 
3.8 2.6 
3 1.2 
O.8 0.5 
0.4 O8 
1.3 0.7 
0.2 04 
0.0 01 
2.3 1.7 


t Only the noteworthy levels of significance (beyond 10%) are entered. 
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It will be seen that the schizophrenics above 
have no significant differences and comprise, 
as it were, the prototype of psychosis. The 
alcoholics, in spite of being the largest group 
contributing to the “mean psychotic,” differ 
very significantly by being lower on 2 and 7. 

Manic-depressives show a distinct, char- 
acteristic pattern which, in spite of the small 
numbers, is statistically significant, both in 
relation to general psychotics and to schizo- 
phrenics. From both of the latter they differ 
by being higher on factor 4, and from the 
total psychotics by being lower on factor 11. 
With respect to factor 4 the manic-depres- 
sives and the schizophrenics fall on opposite 
sides of the normal mean, which suggests that 
this factor has close connection with the di- 
mension envisaged by Bleuler, Kretschmer, 
and others. It is interesting to note that the 
paranoids share some of the characteristic 
differences of both schizophrenics and manics, 
but have one additional divergent factor and 
finish with a uniquely characteristic profile. 

These results, if confirmed on another 
sample, indicate that the test is a powerful 
means of psychiatric diagnosis, for if differ- 
ences on single factors exist at such levels of 
statistical significance the prediction from the 
combination of factors in this pattern should 
yield substantial separation of the two groups. 
For example, since the factor measures are in 
principle independent, the difference of the 
normals and abnormals would be significant 
approximately at the (1/100)° level, and the 
resulting absence of any substantial overlap 
between the two distributions should make 
prediction even on the individual case highly 
reliable. As far as an exploratory study per- 
mits we can roughly indicate the diagnos- 
tically useful patterns as follows: (1) to dis- 
tinguish psychotics from normals: low 2, low 
4, high 6, low 7, high 9; (2) alcoholics should 
be similarly distinguished, but also by being 
especially low on 2 and 7, which pattern 
should further distinguish them from other 
psychotics; (3) paranoids distinguish from 
normals by low 5 and low 11; and so on for 
other pairs of groups. 

Examined in terms of the meanings of the 
correlations found between these music fac- 
tors and normal personality factors these psy- 


453 


chotic patterns are psychologically consistent 
and recognizable. But the psychotic associa- 
tions also throw further light on the psy- 
chological meaning assigned to the music 
factors. Thus, in terms of the labels now 
assigned to the music factors in the hand- 
book (3) the paranoid pattern combines the 
factor of “Paranoid Imperviousness vs. Overt 
Anxiety” (Low 5) and “Schizothymia” (Low 
11). The alcoholics combine “Frustrated 
Emotionality” (Low 2) and “Withdrawn 
Schizothymia” (Low 7), incidentally corre- 
sponding to the 16 P. F. Test factors (see 
Table 2) known as Q, (Nervous Tension), 
and H — (Withdrawn Schizothymia), which 
pattern well fits the published cescriptions 
and analyses of the dynamics of alcoholism. 
The manics distinguish from normals by being 
high on the factor of Eccentricity, on Domi- 
nance, and on Frustrated Emotionality (ice. 
on music factors 6, 9, and 11 |—], corre- 
sponding to 16 P. F. factors C|—| |with 
others], E and Q,). The original general in- 
terpretation of the Q, factor of “Jitteriness”’ 
or “Somatic Anxiety” as “Frustrated Emotion- 
ality” is strengthened by this association of 
the factor with both alcoholism and mania, 
and by its absence from the schizophrenic 
profile. Similarly light is thrown mutually on 
the alternative escapes of alcoholism and 
manic excitement, by the association of the 
“Withdrawn Schizothymia” (16 P. F. factor 
H|—]|) with the former, and of “Eccen- 
tricity” and “Dominance” with the latter. 
With increasing investigation of the physio- 
logical, social and dynamic meaning of such 
unitary, measurable factors, as established in 
normal populations, the way toward causal 
explanation of the psychoses could become 
much more clear. 

Space does not permit here any extensive 
discussion of the relation of the personality 
associations of the music factors to the char- 
acter of the music per se, in the factor items. 
However, one may note that the psychotic 
group seems to prefer, according to the musi- 
cal items in factors 2, 3 and 4, music that is 
relatively slow and simple (and also rela- 
tively “sad”). Further, from the difference 
on factor 7 it can be added that they tend 
to avoid brightly colored (harmonically and 
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texturally) music in favor of clear harmonic 
progressions, sweet melodies and subordinate 
accompaniment. The exception to this pat- 
tern is the manic group, which, on its distin- 
guishing factor (No. 4), prefers fast, ex- 
hilarating, stimulating pieces with textural 
complication, rhythmic variation and _ less 
obvious melodic outlines. These associations 
might roughly be explained in terms of em- 
pathy, but as more evidence accumulates 
they should receive more direct research in- 
vestigation, especially in the light of such 
research approaches as those of Rigg (7, 8). 


Summary 


1. A previously completed factor analysis of 
120 very diverse musical excerpts was used 
as a basis for construction of a Music Prefer- 
ence Test of Personality, set up to measure 
eleven factors by 100 items on two sides of a 
long-playing (33% R.P.M.) record. As the 
equivalence of the A and B forms is inade- 
quate for three or four of the factors, it is 
recommended that these be reserved for re- 
search improvement, by item analysis, and 
that the remaining seven or eight factors alone 
be used as internally valid measures in rou- 


tine applied psychology, notably in seeking 
external validities by predictions in clinical 
and guidance psychology. 

2. Since the established groupings of items 
do not correspond to musical schools or 
periods (though possessed of some consistency 
of musical character) it is hypothesized that 


they represent dimensions of personality 
(especially of temperament) determining 
taste. Correlation with the 16 Personality 
Factor Questionnaire Test, on normal popu- 
lations of 102 and 71, confirmed this by yield- 
ing many significant correlations. 

A one-to-one relation of music preference 
and personality factors cannot be proven by 
these results, since both measures of factors 
are imperfect. But the correlations, corrected 
for attenuation, are at least consistent with 
the hypothesis that, but for contamination, 
the same personality dimensions determine, in 
all but two cases, both the verbal and the 
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music preference factors. Contingent titles 
have been given to the music preference fac- 
tors in accordance with the personality as- 
sociations. These titles proceed on the prob- 
ability that most music factors are primary 
personality factors though some may be 
second-order personality factors. 

3. Application of the Music Preference 
Test to 98 patients in mental hospitals re- 
vealed several factor measure differences, sig- 
nificant at the 1% level, between psychotics 
and normals and between various psychotic 
syndrome groups. If confirmed on further 
samples, these pattern differences are so 
marked as to make the test a valuable adjunct 
to psychiatric diagnosis. The meaning of the 
music factors as indicated by the personality 
factor correlations agrees well with the mean- 
ing as found independently in terms of the 
associations with psychotic syndrome groups. 
These scales might therefore have value in 
throwing further light on individual psychotic 
syndromes. 
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A Rating-Scoring Method for Free-Response Data 


Ralph R. Canter, Jr. 


University of California, Berkeley 


In a study designed to evaluate a human 
relations training program for executives and 
supervisors (reported in detail elsewhere, 1) 
a forced-normalizing method was employed to 
score written answers to open-ended questions. 
Answers to four questions contained in a 
specially prepared Supervisory Questionnaire 
were evaluated by four raters in accordance 
with the procedures outlined in this paper. 
This questionnaire was one of a number of 
tests and other questionnaires administered 
to an experimental (trainee) group and a 
matched control group. The N was 18 in 
each group. 

The questions used in the Supervisory 
Questionnaire were developed in light of three 
considerations: (a) the trainees should be 
given an opportunity to express themselves in 
their own words concerning the kinds of prob- 
lems they had in their jobs; (b) the questions 
should not be drawn from the course content, 
but should be directed toward the individual 
supervisor in his job; and (c) the questions 
should not be structured in such a fashion 
that certain kinds of answers would seem im- 
portant. As an example, one question used 
was: “Do you feel you have the kind of 
cooperation from your employees that you 
want? What do you think accounts for 
this?” 

Rationale and Procedure ' 

It was hypothesized that if we were to take 
all the written answers to a single question 
made by the persons in both the experimental 
(E) and control (C) groups and have raters 
sort them into n categories, the E and C pre- 
test distributions should be almost identical 
and have the same mean. However, it was 
thought the E responses following training 

1The writer is indebted to Dr. F. M. Fletcher of 
Ohio State. University for assistance in developing 
this method. Thanks for services as raters are ac 
corded to Dr. J. H. Hemphill and Dr. M. S. Seeman 
of the Personnel Research Board, Ohio State Uni 


versity, and Dr. E. E. Ghiselli and Mr. Richard 
Barthol, University of California. 
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would be distributed by the raters in such a 
manner that the E mean would be reliably 
higher than the C mean, thus enabling us to 
conclude that the training was effective in 
producing change along the dimensions of the 
questions used.*_ The forced normal distribu- 
tion of judgments was considered as an effec- 
tive method to use in this situation. The de- 
rived procedures will be described in the con- 
text of the investigation. 

Since there was a total of 36 E and C re- 
sponses to each question, a normal area dis- 
tribution with an N of 36 was determined for 
seven categories, this number being arbitrarily 
used because of the small N and the relative 
ease for the raters. The numbers of cases 
falling in the respective categories were: 1, 3, 
8, 12, 8, 3, and 1. 

Four raters were used, all being social 
scientists experienced in dealing with written 
questionnaire item responses and having spe- 
cific knowledge of desirable supervisory prac- 
tices and qualities. Each was given written 
instructions, a summary of which follows. 
The general nature of the task was described 
and the specific questions were listed. The 
rater was asked to judge the responses to these 
questions in terms of the degree to which they 
reflected over-all supervisory quality. The 
rater was told that the responses came from 
Ss in E and C groups, but that the procedure 
required him to be in ignorance of whether a 
respondent was in the E or C group. He was 
then instructed to sort the responses to the 
first question into the seven categories in ac- 
cordance with the assigned numbers (i.e., 12 
responses in Category No. 4, 8 each in Cate- 
gory No. 5 and No. 3, and soon). The exact 
procedure was described, essentially involving 

‘In the major study (1) the trained supervisors 
were found to have gained in mean score at a sta 
tistically significant level of confidence over the un 
trained supervisors on the Supervisory Questionnaire. 
This measure also intercorrelated highly with other 


tests and measures on which statistically significant 
gains were found. 
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separation of the best and poorest responses 
at each step. He next proceeded to the 
second question and so on. The rater was 
never informed as to whether he was dealing 
with a set of pretest or posttest responses. 
Each response was scored by summating 
the category values assigned by the four 
raters. The number of a category was used 
as a score. For example, the four judges may 
have respectively placed a response in the 
following numbered categories: 2, 3, 3, and 4. 
The response would receive a score of 12. 
Each individual’s four scores were then sum- 
mated, this being his over-all questionnaire 
score (which was treated statistically within 
the framework of the larger investigation). 
Records by separate item scores and by 
total scores were kept for each rater so that 
inter-rater reliability could be estimated. 


Inter-Rater Reliability 


Table 1 contains the Pearsonian correlation 
coefficient between raters on the summated 


questionnaire score (i.e., total of scores as- . 
signed by each rater to each respondent on ° 


Table 1 


Inter-Rater Reliability Coefficients for Supervisory 
Questionnaire Summated Scores 
Pretest 
Rater A B 


B 52 
Cc 52 63 
D 54 63 .66 
Average Intercorrelation Coefficient* 
Total Summated Questionnaire Score Reliability 
(Corrected by Spearman-Brown formula with 
four raters) 


Posttest 
Rater A B 
B 7 
Cc 69 08 
D 68 70 Jl 
Average Intercorrelation Coefficient* 
Total Summated Questionnaire Score Reliability 
(Corrected by Spearman-Brown formula with 
four raters) 88 


* Obtained by formula 118, p. 197, Peters and Van 
Voorhis (3). 
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each of the four items) for both the pretest 
and posttest. The inter-rater correlations are 
not reported for the four separate items; we 
wish only to note that the range of these cor- 
relations on the pretest was from 0.31 to 0.71 
with a mean of 0.47, and 0.26 to 0.79 on the 
posttest with a mean of 0.49. 


Discussion 


The inter-rater reliabilities appear to be 
quite adequate and within the usual range of 
reported reliabilities. Rating each question 
separately has the effect equivalent to adding 
more raters (2). Also, it is possible to as- 
sume that a fairly high degree of unidimen- 
sionality is accorded to each item (the rater 
has only a single question to keep before 
him). The responses can be viewed as homo- 
geneous since the rater has only a single 
criterion to keep in mind—in this study good- 
ness of response as related to supervisory 
quality. These conditions act to increase 
reliability. 

In using a technique such as this it ap- 
pears that some of the hazards involved in 
trying to get scales can be avoided. How- 
ever, Suchman (4) has pointed out the dif- 
ficulties with “non-itemized” judgments or 
ratings, noting especially that such proce- 
dures produce no definition of. the variable 
under consideration. With this we must con- 
cur. But much depends upon the uses to be 
made of such ratings. In the example used 
the major intent was to determine whether 
the training appeared to have any effect at all 
on the trainees’ free responses about how they 
performed in their jobs. 

Subsequent studies would be required to 
specify the correlations between the particu- 
lar training content and the observed effects, 
as usually is the case. From this standpoint 
the method proposed here is best viewed as 
one which determines whether further stud- 
ies are warranted. 


Summary 


A rating-scoring technique for evaluating 
free response answers was developed through 
the use of a forced normal distribution of 
judgments. An example of its use in a study 
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evaluating a human relations training course 
was described wherein the criterion used by 
four judges was over-all supervisory quality 
as revealed in pretest and posttest written 
responses made by experimental and control 
subjects in regard to their job’ performance. 
Satisfactory inter-rater reliabilities 
found. 


were 
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For the purpose of evaluating the potential 
usefulness of a student evaluation program 
of college faculty members, an investigation 
was initiated to determine the effect of grad- 
ing leniency upon merit rating scores of 
faculty members in the School of Business 
and Industry. To the extent that grading 
leniency was highly correlated with merit rat- 
ing scores, the usefulness of student evalua- 
tion could be seriously questioned. 

As a corollary of the basic study, the rela- 
tionship between absence extensiveness and 
faculty ratings was investigated to supple- 
ment data uncovered in a previous study 
about the effectiveness of an unlimited ab- 
sence regulation. To the extent that highly 
rated instructors attract students to their 
classrooms despite an unlimited absence regu- 
lation, restricting the number of absences 


which students are allowed may merely serve 
to bolster the security feelings and egos of 
lowly rated instructors, and further frustrate 
the students who are engaged in the program 
of evaluating faculty members. 


Procedure 


Grading leniency was determined by de- 
riving the arithmetic mean of quality points 
issued by each faculty member to his stu- 
dents, and then, by ranking each faculty 
member according to obtained means. Owing 
to the selection process operating during a 
four year college curriculum, the average 
class grade increases progressively from the 
freshman to the senior level. Consequently, 
in order to control variation based upon aca- 
demic level, rather than upon faculty leniency, 
separate ranking distributions were made for 
freshman-sophomore and junior-senior levels. 

The determination of absence extensiveness 
was accomplished by ranking nineteen faculty 
members according to the median number of 


1 Now at Oklahoma A & M College. 
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class absences accumulated in their classes. 
Ranking distributions were made for fresh- 
man-sophomore and junior-senior levels, as 
well as the freshman through senior levels 
combined. 

Since the evaluation data of faculty mem- 
bers were available in ranked form, Spearman 
rho was used to determine the relationship 
between grading leniency and merit rating 
scores. The same technique was used to un- 
cover the relationship between absence exten- 
siveness and merit rating scores. In addition, 
multiple correlation analysis was used to de- 
termine the combined effect of grading leni- 
ency and absence extensiveness upon the merit 
rating scores of faculty members. 

Absence and grading data are based upon 
reports which 19 faculty members submitted 
to the Dean of the School of Business and In- 
dustry. Faculty evaluation data are based 
upon information secured during regular class 
periods by members of an honorary scholastic 
fraternity after mid-semester grades were 
posted, but before final grades were assigned. 
Instructors were rated on an _ eight-point 
graphic rating scale which permitted distribu- 
tion of judgments along a continuum of five 
verbally described points. The following fac- 
tors were used: 1. knowledge of the subject; 
2. class preparation; 3. clarity of speech; 4. 
avoidance of sarcasm; 5. fairness in grading; 
6. absence of mannerisms; 7. creation of in- 
terest in subject matter; and 8. ability to con- 
trol temper. 

Under provisions of the evaluation program, 
an instructor left his classroom earlier than 
usual and permitted students to rate him dur- 
ing his absence. Whenever an instructor was 
rated by less than 50 students, or by less than 
three different classes, he was excluded from 
the standardizing population, and is also ex- 
cluded from the present study. The obtained 
ranking distributions are based upon approxi- 
mately 1,500 cases. 





Student Evaluation of College Faculty Members 


Results 


On the freshman-sophomore level, a very 
significant and moderately high positive rela- 
tionship is found between grading leniency 
and faculty merit rating scores. Consult Table 
1 for more specific information. Through the 
use of a coefficient of determination described 
by Guilford,* it is evident that approximately 
53% of the variance in the rating received 
by an instructor who teaches freshman-sopho- 
more level courses can be attributed to grad- 
ing leniency. No significant relationship is 
found between obtained rating and grading 
leniency on the junior-senior level. On a 
combined basis, freshman through senior, a 
significant but moderate relationship exists 
between obtained rating and grading leni- 
ency. Approximately 25% of the variance in 
an instructor’s rating can be attributed to 
grading leniency on an over-all four-year basis. 


Table 1 


Spearman Rank-Difference Correlations for Faculty 
Members Ranked on Three Factors 


Compared Rankings 
Grades 
vs. Ab- 
sences 


Rating Grades 
vs. Ab- vs. 
sences Rating 
—.21 —.17 a 
—.19 — .26 A3 
— .08 B®. BS ig 


Class Level N 





Freshman-Sophomore 13 
Junior-Senior 17 
Freshman-Senior 19 





* Significant at the five per cent level. 
** Significant at the one per cent level. 


No significant relationships are found be- 
tween obtained rating and absence extensive- 
ness, on either the freshman-sophomore or the 
junior-senior levels. However, a significant, 
negative, and moderate relationship is found 
between instructors ranked according to merit 
rating scores obtained by student evaluation 
and the same instructors ranked according to 
the median number of absences accumulated 
in their classes, on a four-year breakdown. 
Thus, an instructor with a lower number of 
absences receives the higher rating. Approxi- 
mately 28% of the variance in absence ex- 

2 Guilford, J. P. Fundamental statistics in psy- 


chology and education. (2nd Ed.) New York: 
McGraw-Hill, 1950. 


Table 2 


Multiple Correlations Between Faculty Ratings and 
Grading Leniency Combined with 
Absence Extensiveness 





Correlation Data 


Class Level N R SEr 


Freshman-Sophomore 13 aa Bh 
Junior-Senior 17 47 
Freshman-Senior 70** 





** Significant at the one per cent level 


tensiveness can be accounted for by the merit 
rating scores of instructors. By the same 
token, 28% of the variance in an instructor’s 
rating can be attributed to absence extensive- 
ness. 

No significant relationship is found be- 
tween instructors ranked according to grading 
leniency and instructors ranked according to 
the median number of class absences accumu- 
lated in their classes. The lack of such rela- 
tionship is evident on the freshman-sopho- 
more, junior-senior, and the combined fresh- 
man through senior levels. 

When absence extensiveness and grading 
leniency are combined, and the overlap be- 
tween these factors is held constant, the com- 
bination of these factors accounts for ap- 
proximately 53% of the variance in an in- 
structor’s rating on the freshman-sophomore 
level. Consult Table 2 for detailed data. 
The same combination of factors accounts for 
about 22% of the variance in an instructor’s 
rating on the junior-senior level, and for ap- 
proximately 50% of an instructor’s rating 
variance on the freshman through senior 
levels. 


Discussion 


For the sample used, the grades which 
faculty members assign students are reflected 
in the quality of rating which students assign 
faculty members. The extent to which stu- 
dents evaluate faculty members according to 
class grades received varies with academic 
levels of the students. Grading leniency ac- 
counts for almost three times as much vari- 
ance in faculty ratings on the freshman-sopho- 
more level, as it does on the junior-senior 
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level where the relationship is not statistically 
significant. Conceivably, students who sur- 
vived the selection process operating during 
the first two years consider faculty grading 
leniency as a relatively unimportant criterion 
on which to base evaluation of faculty 
members. 

Class absences are negatively correlated 
with faculty ratings. ‘These results may indi- 
cate varying degrees of student interest in the 
classroom behavior of faculty members. If 
such is the case, lowly rated faculty members 
apparently repel students from their class- 
rooms, and accordingly, accumulate dispro- 
portionate numbers of class absences. How- 
ever, in the interpretation of the relationship 
between absence extensiveness and faculty 
rating scores, it should be noted that the fac- 
tor of absence permissiveness remains uncon- 
trolled. 

Absence permissiveness could operate di- 
rectly or indirectly. Direct operation could 
involve open avowal or subtle implication, on 
the part of highly rated instructors, that class 
attendance is not required for satisfactory 
course performance. Conversely, lowly rated 
instructors may insist upon daily attendance 
to the point where daily recitation grades 
carry an unduly preponderant weight in the 
determination of final class grades. Loading 
course examinations with textbook questions, 
while minimizing the inclusion of lecture ques- 


Alexis M. Anikeeff 


tions, could illustrate the manner in which 
indirect absence permissiveness would operate. 


Summary 


Nineteen faculty members were ranked in 
accordance with the merit rating scores as- 
signed to them by their students. Using 
Spearman rho, merit rating ranks were cor- 
related with grading leniency and absence ex- 
tensiveness rankings of the same instructors. 

1. Grading leniency correlated highest with 
merit rating scores on the freshman-sopho- 
more level, and lowest on the junior-senior 
level. 

2. Absence extensiveness correlated nega- 
tively on_all academic levels, but the correla- 
tion was significant only on the combined 
four-year breakdown. 

3. The selection process operating during 
the freshman-sophomore years could reason- 
ably account for a low and statistically non- 
significant relationship between grading leni- 
ency and student evaluated faculty merit 
ranking on the junior-senior level. 

4. Class interest of students could account 
for the negative relationships between faculty 
members ranked according to the number of 
ciass absences found in their classes and the 
same instructors ranked according to teaching 
competence as evaluated by students. 


Received March 6, 1953. 
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Estimating Grade Reliability ' 


Scarvia B. Anderson 


Naval Research Laboratory, Washington 


When grade point ratio is used as the 
criterion of school “success” or “failure,” the 
need for an adequate estimate of the relia- 
bility of the ratios presents a recurring prob- 
lem. The author encountered the problem 
most recently in a study at George Peabody 
College for Teachers of the value of certain 
entrance tests in predicting freshman grade 
point ratio (1). If the results were to be 
used for the selection and counseling of stu- 
dents and for the determination of an ade- 
quate testing program, some estimate of the 
reliability of the criterion seemed essential. 

The tests used were the American Council 
on Education Psychological Examination; the 
Cooperative Reading Comprehension Test; 
the Cooperative Mechanics of Expression 
Test; and Otis Quick-Scoring Mental Ability 
Tests for grades four through nine, which 
were used as practice tests. Multiple cor- 
relations of test scores with weighted grade 
point ratios were .62 and .59 for one quarter 
and three quarters, respectively. It was rea- 
sonable to assume that grade point ratios for 
three quarters were more reliable than those 
for one quarter, but a statistical estimate of 
such reliability seemed desirable in the final 
interpretation of the results. 

Ebel (4) has presented a method, based 
upon analysis of variance, for estimating the 
reliability of sets of ratings, and in addition 
has considered in some detail the relationship 
between this procedure and those of Horst 
(5), Snedecor (7), Clark (2), Peters (6), and 
Cureton (3). He concludes that the intra- 
class formula, such as his, Cureton’s, and 
Snedecor’s, is generally preferable to the 
average intercorrelation or generalized re- 

'The author is deeply indebted to Dr. Julian C. 
Stanley, University of Wisconsin, for material help 
in the preparation of this article and to Dr. E. E 
Cureton, University of Tennessee, and Clarence W. 
Spence, George Peabody College for Teachers. 

The opinions and assertions contained herein are 
the private ones of the writer and are not. to be con- 


strued as official or reflecting the views of the Navy 
Department or the naval service as a whole 
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liability formula, such as Horst’s, Clark’s, and 
Peters’. 

In the present paper, we shall discuss: (a) 
the differences obtained when the Horst and 
Cureton formulas were used to estimate the 
reliability of the same sets of freshman 
grades; and (b) a second problem, which 
arose in the application of the formulas, the 
use of unweighted versus weighted grade point 
ratios. 

Both Cureton’s formula (which, inciden- 
tally, is the result of a derivation parallel to 
Ebel’s) and Horst’s are based on the well- 
known generalized formula for the reliability 
coefficient : 


In application here, the error variance is an 
estimate of the error variance of the individ- 
ual means, and the observed variance is the 
variance of the means for all of the in 
dividuals. 

Cureton’s and Horst’s formulas are shown 
in Table 1. The chief statistical differences 
between them may be summarized as follows: 

1. Cureton uses a weighted variance for 
the estimate of the error variance, and Horst 
does not. 

2. Cureton uses a weighted variance of the 
person means, and Horst does not. 

3. Cureton divides by V — 1 in the variance 
of the means, and Horst divides by N. 

A careful study of the two methods and 
their respective relevance to freshman grade 
point ratio suggested that Cureton’s technique 
was more appropriate for our use. If our 
freshmen were considered a sample of a uni- 
verse of Peabody freshmen, the relevance of 
dividing by N — | was apparent. In addition, 
we agreed with Cureton that his formula 
would give a somewhat better reliability esti- 
mate, since the values he uses for errar vari- 
ance and total variance of the person means 
are “‘statistically independent in the sense of 
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Table 1 


Formulas for Estimating Reliability of Means of Unequal Numbers of Scores* 


Horst (5) 


\ 


> 52,7 
~~ n(n; — 1) 
(M; — M)? 
e. sre _»y M;? 
_ ~~ mile; — 1) “(nj — 1) 
2(M,; — M)? 


Cureton (3) 


 DSx2 
n(mn; — N) 

os Dn;(M; — M)? 

=i niN — 1) 


DSx,2 
Dn; a N 
~ Sni(M; — My? 
~ N—1 


' ( N-1 ( VSX27— MISX, _ :) 
Sn, — NAZ(MSX,) — MESX, 


(4C) 


*N = number of individuals, Sx; = sum of the deviation scores for individual 7, SY; = sum of the raw 
scores for individual i, mn; = number of scores for individual 7, m = mean number of scores for N individuals, 


M, = mean score for individual i, 47 = mean score for N individuals. 


analysis of variance, while those given [by 
Horst] . . . are not quite independent” (3, 
p. 2). Still more important, however, since 
the results of the freshman test study were to 
be used for predictive purposes, the weight- 
ing of the variances, so that a mean based on 
a smaller number of measures would receive 
a smaller weight than a mean based on a 
larger number of measures, should furnish a 
better population estimate of reliability. 

However, it was decided to use both meth- 
ods in estimating the reliability of the fresh- 
man grades for one quarter and for three 
quarters, in order that the results might be 
compared for discrepancies.” 

At this point, it seems well to consider 
briefly the first problem encountered in ap- 
plication of the two formulas. In the original 
analysis of the relationship between grades 
and test scores, grades were weighted accord- 
ing to the number of quarter hours that a 


2It is realized that any technique for estimating 
reliability assumes independence of measures. In the 
case of grade point ratio, it is difficult, if not impos- 
sible, to meet this assumption. In addition to the 
possibility of teachers discussing among themselves 
the ratings they give students, a single grade point 
ratio may contain grades from two or more courses 
under one professor. In order to obtain a statistical 
estimate of reliability, however, one seems to have 
no choice but to use one of the standard reliability 
formulas, recognizing the limitations imposed by the 
conditions usually surrounding college grading 


course carried: 


> (heipei) 
Bigg 0 Seem, 


nm 


Dd hei 
1 


where 


Mi, = weighted grade point ratio for indi- 
vidual 7. 

= number of quarter hours in any one 
course c that individual 7 takes. 
points assigned to a grade in any one 
course ¢ that individual 7 takes.’ 
number of courses that individual 7 
takes. 


It was reasoned that a grade in a five hour 
course would be considerably more reliable 
than a grade in, say, a two hour course, for in 
a five hour course the instructor would meet 
with a student more frequently, would prob- 
ably give more quizzes, and would generally 
be in a better position to give a more (sub- 
jectively) reliable grade. However, if this 

8 Points were assigned on the basis of the follow 
ing scale: A + 12 points, A= 11 points, A 10 
points, B + = 9 points, B = 8 points, B 7 points 
C4 6 points, C = 5 points, C 4 points, D+ 


3 points, D=2 points, D 1 point, F=0 
points 
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viewpoint was maintained, several adjust- 
ments would have to be made in order to use 
Cureton’s or Horst’s formula. We could not 
let m; equal the number of quarter hours taken 
by an individual, since that interpretation 
would not be compatible with the original 
meaning of m; in the formulas. A letter from 
Dr. Cureton granted that a grade in a five 
hour course would probably be somewhat 
more reliable than a grade in a two hour 
course; however, he indicated that the differ- 
ence would be less than might be suggested 
by the weights of 5 and 2, for in most courses, 
regardless of the number of hours, a final 
examination is given and instructors gen- 
erally try to administer enough tests to give 
(subjectively) reliable grades. 

Unweighted grade point ratios were com- 
puted as follows: 


where MM; 

individual i. 
These correlations between weighted and 

unweighted grade point ratios were obtained: 


unweighted grade point ratio for 


.98 between weighted and unweighted grade 
point ratios for three quarters. 

.96 between weighted and unweighted grade 
point ratios for one quarter. 


As a result, any advantage of weighting 
seemed so small that we were satisfied to go 
ahead with the calculation of the reliability 
coefficients, using unweighted grade point 
ratios and letting m equal the number of 
courses that individual 7 took. 

The Cureton raw-score formula (4C) was 
used first, and then since M;, M, and SX, 
were already known, the raw score derivation 
(4H) of the Horst formula was used. The 
resulting reliability coefficients were as fol- 
lows: 

Grades for Horst Cureton 


3 quarters 90 90 
1 quarter 48 63 


Although the first two r’s are identical, 
there is a rather wide discrepancy between the 


last two r’s. It seems that there is logically 
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no statistical method for testing the sig- 
nificance of the difference between these two 
estimates of the reliability of the same meas- 
ures, and one must return to the original 
formulas for an explanation of the numerical 
differences. The fact that the reliability co- 
efficients for three quarter grade point ratios 
are the same and for one quarter grade point 
ratios are different is directly attributable 
to the weighting process used in Cureton’s 
formula. With the large values of S°n, (Mn, 
equals 13.48) for three quarters, the weight- 
ing seems to have a negligible effect on the 
value of r; while with the smaller S°n,; (Mn, 
equals 4.62) for one quarter, the weighting 
process results in a considerable difference be- 
tween the variance estimates substituted in 
Cureton’s and those substituted in Horst’s 
formula. Cureton’s formula gave for one 
quarter a considerably smaller estimate of 
error variance than did Horst’s. 

The immediately obvious conclusions that 
were drawn from the computed reliability co 
efficients were that: (1) the use of one quar- 
ter’s grades alone would not be adequate for 
our purposes; and (2) three-quarter grade 
point ratios represented a fairly reliable cri- 
terion.* The difference obtained between the 
Horst and Cureton coefficients for one quarter 
did not affect these conclusions. If there had 
been a question of interpretation, we should 
have used the reliability estimate given by 
Cureton’s formula for the reasons already 
given. 

In other cases where reliability estimates 
of unequal numbers of ratings were to be 
made, we would generally tend to use Cure- 
ton’s formula when we were interested ,in re- 
liability for the prediction of population be- 
havior from a sample of that population or 


*Interpretation is much more difficult than this 
statement indicates, however. Since many of the 
students had fewer different teachers than quarter- 
courses during the three quarters (especially ‘in the 
required English sequence), grades received by an 
individual from quarter to quarter were by no means 
independent of each other. How much less the esti- 
mated reliability coefficient of .90 would be if true 
independence existed cannot be judged from these 
data. It is interesting to note that despite markedly 
poorer reliability of first-quarter grade point ratios, 
the multiple R between test scores and first-quarter 
GPR’s was slightly higher than for the entire year 
(.62 vs. .59). 
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An Economical Test Battery for Predicting Freshman Engi- 
neering Course Grades 


William Coleman * 


The University of Tennessee 


The shortage of trained engineers at present 
is estimated to be nearly 65,000, and occupa- 
tional market analysts have pointed out that 
this shortage is likely to remain for a num- 
ber of years. In light of this situation, engi- 
neering educators have been engaging in an 
active campaign of informing and _ inviting 
able high school students to consider engineer- 
ing as a profession. Since the training of 
engineers is an expensive program as well as 
a demanding one for the student, means for 
facilitating selection and guidance of prospec- 
tive engineers are needed, particularly meth- 
ods that might be utilized by relatively un- 
trained counselors in the secondary schools. 

With this as a motivational basis, a trial 

* The writer wishes to acknowledge the assistance 


of Mr. Lloyd E. Fish, who performed most of the 
statistical computations 


test battery was administered to entering 
freshman engineering students at the Univer- 
sity of Tennessee in September, 1950. The 
battery used included the A.C.E. Psycho- 
logical Examination, 1949 edition, the Co- 
operative English Test, Form OM, the Co- 
operative Algebra Test, Form S, the Minne- 
sota Paper Form Board, and the Bennett 
Mechanical Comprehension Test. This bat- 
tery is similar to the one recommended by 
Stuit and his collaborators (5) for use with 
engineers except that no interest inventory 
was included. Studies reviewed by Moore 
(4) and a recent investigation by Cole (2) 
also suggest the usefulness of instruments 
similar to the ones used here. The particular 
tests were selected because of their ease of 
administration and low cost. 

Grades tabulated for this group from the 


Table 1 


Coefficients of Correlation between Engineering Freshman Test Scores and Grades in Selected Courses 


Mathe 
matics 


46** 
82 


Predictor Test English 
32** 


ACE Q 6 


33** 
82 


A3** 


ACE L 86 


st" 
83 


.61** 
Coop. Engl. 87 
56** 

81 


AO** 


Coop. Alg. 85 


.24* 


Minn. Ppr. Fm. Bd. 82 86 
40** 


83 


06 
Benn. Mech. 


87 


Engineering 


t All r’s are Pearson product-moment. The r’s appear in boldface type with the 
** Significant at the 1% level. 
* Significant at the 5% level. 


Grades 


Civil Mechanical Engineering 


Engineering Problems 


a 


Drawing Engineering 


.20 25° 
61 


19%" 


26* 


61 


25 a 


39** 


50** 


30 


30" 


AS 
81 


19 


62 58 59 
N’s to the lower right. 
‘ 


405 
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Table 2 


Means and Standard Deviations for Test Scores 
and Course Grades 





Standard 

Variable Mean __ Deviation 
ACE Q 41.7 7.9 
ACE L 61.2 16.3 
Cooperative English 141.2 29.5 
Cooperative Algebra 34.0 12.2 
Minnesota Paper Form Board 45.3 79 
Bennett Mechanical 32.7 12.6 
Mathematics 6.3% 2.8 
English 6.1% 2.5 
Engineering Drawing 7.4* 2.0 
Civil Engineering 29 1,2 
Mechanical Engineering : 24 1.2 
By 1.3 


Engineering Problems 


*Summation of three grades. 


fall quarter, 1950, through the fall quarter, 
1951, constituted the criteria for the study. 
Freshman year grades generally take care of 
most of the screening of engineering candi- 
dates at the University of Tennessee, as fail- 
ures are much more unlikely after the first 
few quarters in the engineering curriculum. 
The entering class is a relatively heterogene- 
ous group, as no selection procedures are used 
other than a minimum mathematics require- 
ment of four high school units. 

Instead of using the mean point hour ratio 
for all courses combined, correlation coef- 
ficients were computed for the various tests 
with grades in the courses in the freshman 
engineering curriculum. Table 1 shows the 
various courses and tests for which correla- 
tions were computed. 


Coleman 


Discussion of Results 


Though the coefficients found in Table 1 
are not especially high, several of them are 
sufficiently so to be regarded as useful for 
selection or guidance situations. With a 
population consisting of high school students, 
a higher correlation would be hypothesized 
for this more heterogeneous group. 

The best predictive instrument in the bat- 
tery used seems to be the Cooperative Alge- 
bra Test; the Cooperative English Test ranks 
second. The Bennett Mechanical Compre- 
hension Test tends to get better correlations 
with grades than either of the A.C.E. scores. 
In an unpublished master’s thesis at the Uni- 
versity of Tennessee, Tarvin (6) found that 
the algebra and English tests yielded higher 
correlations than either A.C.E. score among 
freshman students. From these data and 
other studies (4, 11), the predictive value of 
so-called scholastic aptitude tests such as the 
A.C.E. must be questioned in comparison to 
outright achievement tests. 

Further examination of Table 1 will reveal 
that in different courses different instruments 
may be the most effective predictors. It is 
no surprise to find the algebra test predicting 
mathematics grades best, and the English test 
performing in a similar fashion for English 
grades. The Bennett is clearly the best pre- 
dictor in engineering drawing instead of the 
Minnesota Paper Form Board as might have 
been expected. No test emerges as a good 
predictor for civil engineering. This may re- 
flect to some extent the unreliability of grades 
in this course though further evidence is 
needed. In mechanical engineering the Eng- 


Table 3 


Multiple Correlation Work Sheet, Test Scores and Course Grades, 








University of Tennessee Engineering Freshmen 


Test Predictors R N 


Criterion 
Mathematics Alg. (.558) + Bennett (.612)** + Eng. (622) .622 82 
English Eng. (.608) + Alg. (.614) + Bennett (.622) 622 86 
Engr. Drawing Bennett (.453) + Alg. (.505)* + ACE L (.541)* 541 80 
Civil Engr. Alg. (.290) + Eng. (.310) + Minn. (.314) 314 62 


Mech. Engr. 
Engr. Problems 





** Increment in R significant at 1% level. 
* Increment in R significant at 5% level. 


Bennett (.556) + Eng. (.664)** + Alg. (.686) + ACE Q 708 58 
Alg. (.496) + Bennett (.595)** + ACE Q (.612) 





Test Battery for Predicting Freshman Engineering Grades 


lish test stands out as the best predictor. Does 
this reflect an emphasis in grading in this 
course on competence in English usage? The 
algebra test and the Q score seem to be 
the best predictors in engineering problems, 
though the Bennett provides a moderate cor- 
relation coefficient. 

It is interesting to note that the Q score is 
more valuable than the L for this engineering 
group in the courses considered. This, of 
course, is contrary to the usual findings with 
the A.C.E. in other curricula (3, 11). The 
Minnesota Paper Form Board yielded gen- 
erally the lowest correlation coefficients of 
any of the tests. 

Multiple correlations were then computed 
for four of the criterion variables, grades 
in English, engineering drawing, engineering 
problems, and mathematics. Table 3 presents 
these data showing the best multiple correla- 
tions that can be obtained with the tests used. 

The addition of further tests does not add 
much in the case of English and mathematics 
where the zero order correlations were moder- 
ately high in the first place. In engineering 
drawing and engineering problems the extra 
tests appreciably contribute in improving the 
correlation coefficients, from .496 to .612 for 
the problems course, and from .453 to:.541 
for the drawing course. Additional tests seem 
warranted for more reliable prediction in the 
case of these two courses. 


Summary 


In conclusion, it can be stated that this 
study has demonstrated the satisfactory ap- 
plicability of several economical (in terms of 
administration and cost) tests for the prob- 
lem of selecting or guiding prospective engi- 
neering students. The tests which produced 
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the best correlation coefficients were the Co- 
operative Algebra, Cooperative English, and 
the Bennett Mechanical Comprehension Tests. 
The A.C.E. Psychological Examination and 
the Minnesota Paper Form Board were not 
as adequate. The findings in this study seem 
to generally confirm those of previous in- 
vestigators. 


Received February 11, 1953. 
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Over the past ten years the matter of im- 
proving the selection of engineering students 
has been under investigation at the City Col- 
lege. A variety of tests has been used in 
conjunction with the high school average in 
determining which students should be ad- 
mitted. The ACE Psychological Examina- 
tion, the Cooperative General Achievement 
tests, and the Pre-Engineering Inventory have 
been used as well as some special tests con- 
structed for the college by Kenneth W. 
Vaughn. Tests have been added and elimi- 
nated on the basis of studies relating test 
scores to academic grades. At this point the 
program has become fairly stable and conse- 
quently it was felt that a report based on the 
present battery might be of interest to other 
colleges. 

Specifically this study was designed with 
two purposes in mind: to evaluate the effec- 
tiveness of high school averages and scores 
on entrance examinations as a basis for pre- 
dicting four-year grade-point average in an 
engineering college, and to study the relation- 
ship between the four-year college grade-point 
average and ratings on two standard interest 
questionnaires (the Strong Vocational Inter- 
est Blank and the Kuder Preference Record). 
It was also hoped that ratings on the Kuder 
obtained from students during their freshman 
year might be compared with those obtained 
during their senior year. The number of stu- 
dents taking the questionnaire on both occa- 
sions was small, but since there are not much 
data of this type available we shall, never- 
theless, present them. 

The results reported are based on students 
graduating from the School of Technology of 
the City College during the calendar year of 
1951. Of the 521 graduates, 433 were in- 
cluded in one or more phases of this study. 
Since all available data were used in each part 
of the study the number of cases varies from 
one part of the study to another. 


A four-year college average (weighted ac- 
cording to credits and grades), calculated by 
the School of Technology,’ was available for 
each graduate. 


Effectiveness of Selection Techniques 


Studies based on data from previous enter- 
ing classes have indicated that first term 
grades at City College can be most effectively 
predicted by using a Composite Score based 
upon high school average (weight of five), 
and scores on the following tests: Scientific 
Verbal Ability (weight of one), Comprehen- 
sion of Scientific Materials (weight of two), 
and General Mathematical Ability (weight 
of two).? 

The intercorrelations between the four- 
year college average, high school avera e, and 
the scores on the three tests entering into the 
Composite Score are presented in Table 1 
along with the means and standard deviations 
for these variables. The correlations between 
the four-year college average and the other 
variables range from 0.30 to 0.50. The cor- 
relation between the Composite Score and the 
four-year college average is 0.53,‘ which is 


! The authors would like to take this opportunity 
to thank Professor John R. White for making this 
material available to us. 

2 These weights were determined by means of re- 
gression equations in which the effectiveness of these 
three tests as well as the following were determined: 
ACE Psychological Examination, General Verbal 
Ability, Social Science Verbal Ability, and Spatial 
Visualizing Ability. All of the tests, except the 
ACE, are part of the Inventory of Scholastic Ability 
and were developed by Kenneth W. Vaughn. Sev- 
eral of the tests are similar to those included by 
Vaughn in the original Pre-Engineering Inventory 
(13). For a detailed description of the entrance 
examination program at the City College the reader 
is referred to an article by Long and Perry (5). 

3It may be of interest to the reader to compare 
these correlations with those reported in the litera- 
ture for other colleges [2, 6; see summaries by 
Kandel (4), Moore (8), Stuit (11). 

4 The range of the test scores has been reduced by 
about 10 per cent due to the elimination of students 
over a four-year period. The effectiveness of the 
tests is thereby reduced. In previous studies cor 
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Academic Achievement in Engineering 


Table 1 


Intercorrelations, Means, $.D.s: Four-Year College Average, High School Average, and 
Three of the Entrance Examinations (N = 182) 





Variables 





3 





. Fouc-year college average —_ 
. High school average 
. General Math. Ability 
. Comp. Sci. Materials 
. Sci. Verbal Ability 
Mean 
S.D. 


40 
50 
Al 


30 
80.6 
1.8 





only a slight increase over the correlation of 
0.50 between the score on the test measuring 
General Mathematical Ability and the four- 
year college average, but a sizable increase 
over the correlation of 0.40 between high 
school average and the four-year college aver- 
age.” Adding the other tests given as part of 
the Entrance Examination (see footnote 2) to 
the Composite Score does not bring about a 
significant increase in this correlation of 0.53 
between the Composite Score and the four- 
year college average. 


Interests and Grade-Point Average 


Strong Vocational Interest Blank. The 
Strong was administered to 158 of the stu- 
dents as seniors (12 in Chemical, 38 in Civil, 
67 in Electrical, and 41 in Mechanical Engi- 
neering). The mean standard score and the 
letter grade equivalents on the scale for the 
engineers and on the group scales are pre- 
sented in Table 2. The composite profile for 
these students shows a B+ on the scale for 
the engineers and an A on the Group II scale 


relations ranging from 0.55 to 0.70 have been found 
between the Composite Score and the first term 
average (5). 

5 It should be mentioned that about three quarters 
of these students were admitted on the basis of high 
school average alone, while a quarter were admitted 
on the basis of the Composite Score. This means 
that the test in mathematics was part of the selec- 
tive procedure in only a small part of the sample, 
whereas the high school average was used in all in- 
stances (either alone or in combination with test 
scores). This situation explains to some extent why 
the correlation between the mathematics test and 
college grades is higher than that between high 
school averages and college grades. 


(chemist, engineer, mathematician, and physi- 
cist). The correlations between the various 
scales of the Strong and the four-year college 
averages are low and not significant (see 
Table 2). These correlations are, of course, 
lowered to some extent by the fact that the 
academically weaker students have dropped 
out of engineering, as have many of those stu- 
dents with little or no interest in engineering. 

‘It was interesting to note in analyzing the 
interest pattern of the engineering students 
that 82 per cent of them obtained A or B+ 
ratings on the Group II scale, whereas Strong 
reports 77.5 per cent of his criterion group for 
Group II obtained A or B+ ratings. 

Kuder Preference Record. The Kuder 
was given to 172 of the graduating seniors 


Table 2 
Correlations between Scores on the Strong Vocational 
Interest Blank (Form M) and the Four-Year 
College Average (N = 158) 


Letter 
Grade 

Equiv 
alent 


Scales of the Strong Mean* 


Individual Scale 
Engineers 42.6 

Group Scales 
Group I (Human Science) 
Group IT (Technical) 
Group V (Personnel) 
Group VIII (Office) 
Group IX (Sales) 

Group X (Verbal) 


B+ 


41.1 

4069 A 
36.8 B 
30.4 

30.4 

33.8 


* Standard scores. 








Louis Long and James D. Perry 


Table 3 


Correlations between Scores on the Kuder Preference 
Record (Form BM) and the Four-Year 
College Average (N = 172) 








Percentile 
Equiva- 


Scalesof Kuder Mean lentt 


Mechanical 
Computational 
Scientific 
Persuasive 
Artistic 
Literary 
Musical 

Social Service 
Clerical 





91.2 
40.2 66 
76.9 80 
64.5 35 
50.5 65 
49.8 60 
18,2 61 
64.8 30 
42.3 21 


65 


' ™ Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 
¢ Based on male adults (1946 profile sheet). 


(13 in Chemical, 50 in Civil, 79 in Electrical, 
and 30 in Mechanical Engineering). The 
two most relevant scales on the Kuder would 
be mechanical and scientific. The mean 
scores on these two scales were 91.2 and 76.9 
respectively (Table 3). The equivalent per- 
centile ratings would be the 65th and the 
80th, using the male adult norms presented 
in the i946 edition of the Kuder profile sheet. 
The correlations between the various scales 
of the Kuder and the four-year college aver- 
age are all low (Table 3) but a few of them 
are significant at the five per cent level and 


one at the one per cent level. These corre- 
lations are, of course, lowered by the same 
factors mentioned in connection with the cor- 
relations between scores on the Strong and 
grades. 


Kuder Freshman-Senior Correlations 


Thirty-two students who took the Kuder 
during their senior year also took it during 
their freshman year. In comparing the mean 
scores (Table 4) the only difference that is 
statistically significant is that for the scientific 
scale (mean of 81.8 as freshmen; mean of 73.5 
as seniors). 

The correlations between the two sets of 
scores vary considerably (Table 4) from one 
scale to another (—.22 to +.66). These cor- 
relations should be thought of not only as an 
index of reliability but also as an index of 
stability of interests over a four-year period. 

Finding only a limited relationship between 
the ratings on the interest questionnaires and 
academic grades is what one would expect on 
the basis of other studies (1, 3, 7, 9; for 
summaries see 11 and 12). Of course, in a 
counseling situation the interest question- 
naires are used with the idea of obtaining in- 
formation about the interest pattern, not with 


®It is interesting to note that the only scale with 
an r significant at the one per cent level is the 
Literary Scale, which is the same one Yum (14) 
found to be significantly related to grades made by 
the men in his study. 


Table 4 


Correlations between Scores on Kuder Obtained during Freshman and Senior Years (N = 32) 


Mean Score 
Fresh. 


Scales of Kuder Sr. 





Percentile 
Equivalent* 


Fresh. _ Sr. 


Fresh. 





85.8 
34.7 
81.8 
65.3 
55.0 
43.5 
19.8 
67.1 
42.3 


90.3 
36.8 
73.5 
67.6 
53.8 
46.9 
22.0 
62.0 
37.8 


Mechanical 
Computational 
Scientific 
Persuasive 
Artistic 
Literary 
Musical 

Social Service 
Clerical 





* Based on male adults (1946 profile sheet). 


64 


C7 


57 
48 
88 
37 


4d 


42 


17.8 
10.5 
9.9 
12.9 
13.8 
13.6 
8.3 
18.8 
13.6 


35 
21 


ee Nsw un eS STU 
NU Ue me UH 





Academic Achievement 


the idea of getting information that will help 
to predict academic grades.’ 


Summary and Conclusion 


Using a weighted grade-point average based 
on four years of college work as a criterion 
the results of this study indicate that the 
selection of freshman engineering students 
can be improved by the use of both high 
school averages and test scores. The effec- 
tiveness of the following tests were investi- 
gated: Scientific Verbal Ability, Comprehen- 
sion of Scientific Materials, and General 
Mathematical Ability. 

The correlations found between two inter- 
est questionnaires (Strong and Kuder) and 
college grades are not high enough to warrant 
the inclusion of ratings on such questionnaires 
in a selection battery, but yet it is felt that 
such instruments are useful in an individual 
counseling situation. 
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Differences 
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In 1951 the 20-year-old Allport-Vernon 
Study of Values, a scale for measuring evalua- 
tive attitudes, was revised by Allport, Vernon, 
and Lindzey (1). Especially, its social -scale 
was altered considerably in an attempt to 
secure greater homogeneity. 

As originally, the average score of the 
standardization group on each of the six 
values has been equalized, now approximating 
40 for men and women combined. Marked 
systematic sex differences with respect to both 
means and standard deviations remain, how- 
ever. The women are more religious, ‘aesthe- 
tic, and social; the men are more theoretical, 
political, and economic. The range of the 12 
means listed in the Manual of Directions (1) 
is nearly seven raw-score units, while vari- 
ances go from 37 to 111. 

Because each testee has exactly 240 points 
to allot, there is no individual profile level, 
since the mean of his six value scores must 
be 40. The revised booklet supplies only one 
norm profile, employing a mean of 40 for any 
value for either sex. 

If a profile is to be used at all, it seems 


desirable to remove the group level factors— 
mean and standard deviation—separately for 
each sex. This has been done in Table 1 for 
the 1,816 college students who make up the 
general norms. Note there, for example, that 
a raw theoretical score of 53 has the same 
standard-score meaning for men as a theo- 
retical score of 46 has for women. Further- 
more, 43 on theoretical is for men equivalent 
to 37 on aesthetic. 

Two cautions are appropriate here. First, 
Table 1 is based upon national norms and 
may therefore be somewhat imprecise in cer- 
tain local situations. For example, when 
scanning profiles we should remember that 
both women and men in the Southeast may 
tend to score higher on the religious scale 
than do those in other sections (2,3). Second, 
as the Study of Values authors warn (1), a 
“high” score is high in an inter-individual 
sense only if comparisons are made among 
persons who can reasonably be expected to 
have the same average value level. There- 
fore, interpretations should usually be con- 
fined to the relative prominence of intra- 


Table 1* 


Centile Sheet for College Men and Women on Allport-Vernon-Lindzey Study of Values 
Note: Based upon the Norms (851 Men, 965 Women) in the Manual of Directions (1). 


Theoretical Economic Aesthetic 

Centile M W M W M WwW 
90 53 46 54 48 50 53 
75 48 41 48 44 44 48 
50 43 36 42 39 7 42 
25 38 31 36 34 36 
10 34 27 30 29 25 31 


Social Political Religious 
M W M W M W 
47 50 51 46 , 50 57 
43 46 47 42 44 50 
38 41 43 38 37 43 
33 37 38 34 30 36 
28 32 34 30 24 30 


* This is an abbreviated table. To reduce costs the original table has been deposited with the American 


Documentation Institute. 


Order Document 3960 from ADI Auxiliary Publications Project, Photoduplication 


Service, Library of Congress, Washington 25, D. C., remitting $1.25 for microfilm (images 1 inch high on standard 
35 mm. motion picture film) or $1.25 for photocopies (6 X 8 inches) readable without optical aid. 
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“Study of Values” Profiles 


individual values, since the inter-individual References 
meaning of either raw or standardized level- Allport, G. W., Vernon, P. E., and Lindzey, G 


free scores will not be clear when heterogene- Study of Values: a scale for measuring the 
dominant interests in personality. Booklet and 


ous groups are involved. ; 
8 P manual of directions. Boston: Houghton Mif 


Table 1 is merely a statistical attempt to flin, 1951. 


rid the scale of certain inequalities that seem Gray, Susan W. A note on the values of southern 
to make intra-individual comparisons less college women, white and Negro. J. soc 

: Psychol., 1947, 25, 239-241 
precise. 3, Stanley, J. C. and Gray, Susan W. Sex differ 
ences and self-insight on Spranger’s valu 
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A Scale for Measuring Work Attitude for the MMPI 


Mary Tydlaska 


Columbia-Southern Chemical Corporation, Lake Charles, Louisiana 


and Robert Mengel 


Lake Charles, Louisiana Air Force Base 


The Minnesota Multiphasic Personality In- 
veniory (3) is one of the most recent and 
among the best of personality inventories. 
It is designed to measure many aspects of 
personality by scoring various combinations 
of items. Although this scale has found great 
use in its present form in clinics and_hos- 
pitals, it has not been extensively used by in- 
dusiry in pre-employment testing. For this 
laiter purpose it was thought desirable to 
determine if there were items which could 
distinguish between individuals whose per- 
sonality organization expresses desirable atti- 
tudes toward work and good motivation to- 
ward it and individuals whose work attitude 
is notoriously poor. 


Selection of Subjects 


Two groups of subjects were studied. Two 
examples of work attitude were available. 
The 50 subjects from the Columbia-South- 
ern Chemical Corporation in Lake Charles, 
Louisiana are current employees who were 
given the MMPI in a program of pre-employ- 
ment testing which preceded their employ- 
ment. Only those employees who had com- 
pleted two or more years of satisfactory work 
performance were included in this study. 
Satisfactory work performance was based on 
merit ratings given semi-annually and a mean 
score of 3 (defined as ‘satisfactory’) was used 
as the criterion. 

The 60 air force ‘poor work attitude’ sub- 
jects whose MMPI records were used in this 
study were male white air force service per- 
sonnel in the 806th Supply Squadron at the 
Lake Charles, Louisiana Air Force Base. The 
category ‘poor work attitude’ represents 43 
A. W. O. L. cases, 7 disciplinary problems, 8 
individuals suspected of malingering, and 2 
miscellaneous cases. 

The senior writer served, during the sum- 


mer of 1952, as a consultant in administering 
and interpreting a battery of tests designed to 
aid the commanding officer in working with 
these and similar individuals. An evaluation 
of ‘poor work attitude’ for each of these 60 
cases was made on the basis of one or more 
interviews and test data, including a sentence 
completion test and the MMPI 

The groups of air base ‘poor work attitude’ 
cases and ‘satisfactory work attitude’ em- 
ployees were matched for certain items of 
biographical data. These variables include 
intelligence (an Otis IQ for the industrial 
employees and an Airman’s Qualifying Exam 
score for the air base personnel), age, educa- 
tion, general occupational level, and marital 
status. The typical subject was about 27 
years of age, had ave.age intelligence, had 
completed the eleventh grade of school, and 
was more likely to be married than single. 


og Purpose 
} 


The original purpose of this study was to 
utilize the 60 ‘poor work attitude’ air base per- 
sonnel MMPI scores as criteria to evaluate a 
number of MMPI items previously selected on 
an a priori basis by seven individuals in the 
field of personnel selection and testing, as rep- 
resenting information indicative of an ap- 
plicant’s work attitude. This group of seven 
experts was composed of three individuals 
teaching courses in industrial psychology and 
associated with a college or university. The 
remaining four judges were personnel or em- 
ployment managers with 15 years mean ex- 
perience in personnel selection. 

These items were selected in the following 
way. Each judge was asked to indicate on an 
MMPI group score sheet those statements 
and their deviant response which would give 
him insight into the general motivational pat- 
tern and work attitude of an applicant for 
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employment. All items which were selected 
by as many as three out of seven judges were 
included in the experimental form of the 
Work Attitude Scale. 

This preliminary screening of MMPI items 
was undertaken to eliminate a number of an- 
ticipated items such as those which were 
found most valid in screening A. W. O. L. 
recidivists, “excessive use of alcohol, misbe- 
havior in school, trouble with the law . . .” 
(1, p. 231). The writer was aware of the 
possibility that such items might be found 
to discriminate but postulated that they would 
not contribute the specific type of informa- 
tion which would be most valuable in helping 
an employment manager gain insight into a 
potential employee’s subsequent work atti- 
tude. Most items of this nature were net 
selected by three or more judges and, thus, 
were not included in the experimental Work 
Attitude Scale. A total of 58 items composed 
the experimenta! <cale. 

A further consideration for eliminating 
items not selected by three or more judges 
was the desire to establish a number of items 
which would be of practical value and specific 
interest to employment managers in pre-em- 
ployment testing. Deviant responses to these 
selected items can be read individually, and 
they can then be evaluated subjectively as 
well as quantitatively scored. 

A technique designed to contribute more 
meaningful information from a normal MMPI 


profile would have great utility in aiding a 
non-clinically oriented personnel staff mem- 
ber to evaluate applicants from the stand- 
point of their potential adjustment to an em- 
ployment environment. The over-all design 
of this study was to provide for pre-employ- 
ment testing an MMPI Work Attitude Scale 
tailor-made for that specific purpose. 


Procedure 


An examination of the sub-scales in terms 
of their relationship to the two groups of in- 
dividuals included an inspectional analysis of 
the MMPI profiles. This inspection was con- 
ducted in order to determine the number of 
profiles classified as normal and the number 
having T scores at 70 on one or more sub- 
scales. Table 1 presents the results of this 
inspectional analysis. A comparison was also 
made of the previously selected individual 
MMPI items and both groups were scored 
for these individual items. 


Results 


Significant differences were found between 
the profiles of the ‘poor work attitude’ in- 
dividuals and the ‘satisfactory work attitude’ 
employees. For example, 43 (or 71.7 per 
cent) of the 60 ‘poor work attitude’ cases had 
one or more T scores of 70 while only 9 (or 
18.3 per cent) of the 50 ‘satisfactory work 
attitude’ employees had one or more scores 
of 70 or more. 


Table 1 
Inspectional Analysis of MMPI Profiles of ‘Satisfactory Work Attitude’ Employees and 


‘Poor Work Attitude’ Air Base Service Personnel 


‘Satisfactory Work Attitude’ 
Employees 


‘Poor Work Attitude’ Air Base 
Personnel 
Cumulative Cumulative 


y oO 1 y . or 
N 70 /0 ‘ /0 


Normal Profile 41 82 
1 sub-scale 70 or over 49 98 
2 sub-scales 70 or over 


28.4 
53.4 
63.4 
3 sub-scales 70 or over 76.7 
4 sub-scales 70 or over 93.3 
5 sub-scales 70 or over 100 
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Table 3 


Scores on a Tentative Work Attitude Scale 


Frequency of Frequency of 
Satisfactory Poor Work 
Tentative Work Attitude 
Work Attitude Air Base 
Attitude Employees Personnel 
Scale (N = 50) (N = 60) 
25-29 0 8 
20-24 1 15 
15-19 ¥ 18 
10-14 4 12 
5-9 25 1 
0- 4 14 0 
Mean 7.0 16.4 


S.D. 4.1 7.4 


Only 37 items, from the 58 previously se- 
lected by three or more judges, were found to 
distinguish between ‘poor work attitude’ in- 
dividuals and ‘satisfactory work attitude’ em- 
ployees at the .01 level of confidence. The 
previously selected items in the experimental 
scale with the highest chi-square values were 
then combined and are presented in Table 2 ' 
as a Work Attitude Scale. The items are ar- 
ranged in the following order: rank order in 
differentiating ability, the MMPI booklet 
number of the item, the deviant response, the 
MMPI item, the number of each group giv- 
ing the deviant response, and the chi-square 
value attached to the deviant response. 

The MMPI’s of the two groups were re- 
scored in order to obtain the score each in- 
dividual in the two groups made on this Work 
Attitude Scale. The distributions for these 
groups are presented in Table 3. A compari- 
son of the scores was made. The number of 
responses on the Work Attitude Scale for the 
‘poor work attitude’ cases in the validation 
group ranged from 5 to 29 (Mean 16.4, S.D. 
7.4) while scores for the ‘satisfactory work 
attitude’ employees ranged from 3 to 20 
(Mean 7.0, S.D. 4.1). 


1 To save printing costs, a 3-page table listing the 
37 items in the Work Attitude Scale has been de- 
posited with the ADI Auxiliary Publications Project. 
Order Document No. 4080 from Chief, Photodupli- 
cation Service, ADI Auxiliary Publications Project. 
Library of Congress, Washington 25, D. C., remit- 
ting $1.25 for 35 mm. microfilm or $1.25 for 6 by 8 
inch photo-copies. 
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A cut-off score was established where the 
number of mis-identifications reached a mini- 
mum. Using a cut-off score of 13, 15 per cent 
of the ‘poor work attitude’ cases and 12 per 
cent of the ‘satisfactory work attitude’ em- 
ployee group were incorrectly identified. 

Admittedly, the items in Table 2 comprise 
a tentative scale which requires further vali- 
dation. Until comparative studies have been 
carried out, the writer wishes to emphasize 
the experimental nature of this scale. There 
is a possibility that work attitude may not be 
a general factor but rather may be highly 
specific to particular work situations. Some 
attrition of items could then be expected in 
cross validation. 

The writer plans to subject these items to 
further validation by studying the MMPT 
scores of a group of men whose work perform- 
ance and work attitude at Columbia-Southern 
have been consistently merit rated as ‘more 
than satisfactory’ and a group of men termi- 
nated because they were either dissatisfied 
with their assigned work or working condi- 
tions. Further study with a freshman college 
population is also planned. 

An interesting generalization, however, can 
be made from the writer’s experience in in- 
dividually re-reading and scoring each deviant 
response. An unusually large proportion of 
‘poor work attitude’ individuals expressed con- 
cern over their bodily functions and believed 
that they were not in good health. As this 
was almost a chronic complaint, it suggests 
that a relationship exists between the Hypo- 
chondriasis Scale and the proposed Work At- 
titude Scale. 

The writer believes, however, that further 
validation of this scale would prove definitely 
advantageous for the purpose of screening out 
those individuals whose Work Attitude score 
suggests that ‘ney are poor risks for employ- 
ment. The problem of probable risk is an 
important one i: an employment situation. 
Some of the resulting consequences of a poor 
work attitude are: (a) loss of productive 
time; (b) loss of time and effort expended in 
training a poor worker; and (c) negative in- 
fluence of a low morale worker on fellow 
workers. 
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If an applicant is hired for permanent em- 
ployment, it should be with the knowledge 
that his work attitude will enable him to con- 
tribute positively to the demands of the work 
situation and environmental needs of his co- 
workers. It may be that such a short scale 
could have wide use in screening applicants in 
the pre-employment situation. 


Summary 


From 58 MMPI items originally selected 
by three or more judges working in the area 
of personnel selection and testing as repre- 
senting insight into a potential employee's 
inner motivation and work attitude, 37 items 
were found to distinguish at the .01 level of 
confidence between a group of 60 male white 
‘poor work attitude’ air force personnel and 
a group of 50 ‘satisfactory work attitude’ in- 
dustrial employees equated in terms of educa- 
tion, sex, intelligence, age, occupation, and 
marital status. 
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When the 37 items which distinguish at the 
highest level of confidence are combined into 
a scale with unit weights and using 13 as a 
critical score, a Work Attitude Scale is ob- 
tained which correctly identified about 85 
per cent of ‘poor work attitude’ cases and 88 
per cent of ‘satisfactory work attitude’ em- 
ployees. 
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Valid occupational interest patterns pro- 
vide one of the major bases of vocational 
counseling. The usefulness of such profiles 
depends largely upon the nature of the sample 
used for their construction. Two factors, 
success and satisfaction of employees in a 
field, have been shown to affect the form of 
such profiles. 

A study by Hahn and Williams (4) with 
Marine Corps women revealed that certain of 
the Kuder scales distinguished satisfied from 
dissatisfied clerical workers. DiMichael and 
Dabelstein (2) added to the findings by re- 
cording significant relationships between the 
degree of satisfaction with their employment 
expressed by vocational rehabilitation coun- 
selors and their interests on the Kuder Prefer- 
ence Record. In a report by Barnette (1) 
occupationally successful and unsuccessful 
counseled veterans were distinguished on the 
basis of their Kuder measured interests. 

Two other possible influencing factors 
which may alter the values of occupational 
interest profiles are the experience on the job 
and the lack of major interest in other fields 
of individuals in the sample on which a. par- 
ticular vocational pattern is based. These 
factors have been little considered in the de- 

velopment of occupational Kuder preference 
“norms. Slight evidence exists as to the simi- 
_ larity of interest patterns between experi- 
enced and inexperienced workers in various 
, occupations. If the profiles of these two 
groups differ, the practice of utilizing experi- 
enced persons as the basis for vocational 
counseling is seriously handicapped. 


' This research was supported by a grant from the 
Buhl Foundation. 

2 Miss Russell is now on the staff of the Depart- 
ment of Child Study and Research, School District 
of the City of Erie, Pa. 


The desire to remain in the same occupa- 
tion may appear to be similar to satisfaction; 
however, many people who do not express 
discontent for their present occupation may 
nevertheless show a preference for employ- 
ment in some other area. How much of an 
effect such an expression has on occupational 
norms has not been determined. 

It was the purpose of this study to examine 
the similarities and differences on Form BI 
of the Kuder Preference Record: (a) between 
individuals experienced in various occupations 
and persons entering these same occupations; 
and (b) between individuals expressing an in- 
terest in an occupational area other than that 
in which they are experienced and those per- 
sons in the same field who profess no other 
occupational interest. 


Method 


Psychological Service of Pittsburgh, in dif- 
ferent phases of its services, has accumulated 
scores on the industrial form of the Kuder 
Preference Record for various occupations. 
All subjects were male adults whose interests 
and abilities have been measured as part of 
a larger testing program to select persons for 
promotion or employment. 

At the time of an initial interview, the sub- 
jects were asked to indicate the nature of the 
work in which they were presently engag_d 
as well as the positions for which they were 
applying. The members of each occupational 
group were then classified as: (a) entering 
the vocation for the first time (entry group) ; 
(b) having had previous experience in the 
area in which they were seeking employment 
(experienced group); or (c) seeking employ- 
ment in a field other than the one in which 
they were experienced (other interest group). 

Subjects representing five occupations were 
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Table 1 


Breakdown of Sample Studied 


Engineering 
DOT 0X74 
Entry (Inexperienced) 131 
Experienced 123 
Experienced with other 
Job Interest 12 


(sales) 


Sales 
DOT 1X55 


(various) * 


Occupation 


Laboring 
DOT 6669 


Managerial 
DOT 084 


Laboratory 
DOT 0x70 
12 

29 49 


20 39 


(various) ** (machinist } 


* Journalism 1, Business Relations 4, Engineering 5, Office Management 1, Routine Recording 1, Structural 


Crafts 1, Laboring Jobs 8. 


** Sales 9, Engineering 4, Office Management 3, Drafting 1, Mechanical Repair 1, Machinist 2 


chosen for sampling: engineers, 
duction managers, laboratory 
laborers. The samples used 
with the various breakdowns 
Table 1. 

Means and standard deviations of each in- 
terest scale were computed for all sub-groups 
of each occupational area. The significance 
of any differences between mean scores of the 
sub-groups was determined utilizing the “t” 
test. A difference was accepted as significant 
if the ‘‘t” value was beyond the 1% confidence 
level. 


salesmen, pro- 
workers, and 
in this study 
are shown in 


Results 


Engineers. Entry engineers are found to 
have a higher mean on the Mechanical and 
Scientific scales and a lower mean on the 
Musical scale than experienced engineers. 
Engineers seeking sales positions have sig- 
nificantly higher Persuasive interests than ex- 
perienced engineers. Table 2, A. 

Salesmen. Entry and experienced sales- 
men show similar interest profiles with no 
significant mean differences between them. 
Salesmen applying for jobs in other occupa- 
tional areas record a significant drop in Per- 
suasive scores. The larger standard deviation 
for the “other interests” sales group on the 
Mechanical scale is understandable in view 
of the variety of other occupations included. 
See Table 2, B. 

Laboratory Workers. A comparison of 
entry workers with experienced laboratory 
workers reveals a significantly higher Scien- 


See 


tific mean for the entry group. See Table 
z, G. 

Production Managers. A higher Persuasive 
mean is found for production managers with 
other occupational desires when compared 
with production managers seeking employ- 
ment in the same area. Almost one half of 
the “other interest’ group were sales appli- 
cants. See Table 2, D. 

Laborers. The Mechanical and Scientific 
means for laborers with machinist ambitions 
are significantly larger than the correspond- 
ing means for laborers desiring to continue in 
laboring jobs. The variances between the 
two groups on the Mechanical scale are sig- 
nificantly different but it is unlikely that the 
“t” ratio is produced entirely by the differ- 
ences in variances (3). See Table 2, E. 


Discussion 


Two generalizations are suggested by the 


results presented. With respect to the effect 
of experience on Kuder occupational norms, 
the interest patterns of entry and experienced 
workers are essentially similar. The entry 
groups are often characterized by higher 
scores on those interest scales which particu- 
larly belong with their vocetional fields. 
Thus, it was shown that entry engineers were 
significantly higher on the Mechanical and 
Scientific scales and entry laboratory workers 
higher on the Scientific scale than were their 
experienced counterparts. These higher scores 
for the entry group may stem from their 
recent completion of training and their pre- 
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Table 2 


A Comparison of the Means and Standard Deviations between the Competitive Groups 


Groups Mech Comp Sci Pers = Art Lit Mus_ SSer 


_ A. Engineers 

Entry Engineers «= 7" 541 294 410 403 198 178 90 39.0 
(N = 131) 9.3 79 76 12.7 7.9 8.4 5.9 10.9 

* ** * 

Experienced Engineers 48.8 30.9 42.4 11.0 414 

(N = 123) 9.2 8.2 86 13.9 7.9 7.4 66 11.6 
- * 

Engineers with Sales Interests 47.7 273 378 56.6 23.9 128 348 

(N = 12) 7.9 68 78 9.7 7.6 7.1 $3 is 


B. Salesmen 
Entry Salesmen 21.6 61.9 21.6 
(N = 36) 6.6 a 8.6 r. 6.9 


Experienced Salesmen 22.7 62.0 19.0 

(N = 92) 3 8.4 Y 9.5 *. 7.8 
ee 

Salesmen with other Job Interests 53.2 20.7 

(N = 28) 18.5 ; . 9.8 8.1 


C. Laboratory Workers 
Entry Laboratory Workers (M) 48.1 42.3 16.0 
(N = 12) (a) 6.7 : 14.1 6.6 


Experienced Laboratory Workers (M) 468 . 40.3 20.2 
(N = 29) (a) 8.7 8.1 : 15.8 7.3 9.2 


D. Production Managers 
Production Managers (M) 52.9 25.1 36.1 21.3 16.1 
(N = 27) (a) 89 108 ; 99 8.1 10.1 
** 
Production Managers; other Job (M) 50.7 26.2 48.0 19.1 18.2 
Interests (N = 20) (a) 8.5 7.4 if 12.7 7.3 78 


E. Laborers 
Laborers (M) 494 264 37.2 246 17.3 
(N = 49) (a) 11.6 8.2 8.1 11.5 10.4 8.7 
oe * 
Laborers Desiring Machinist Jobs (M) 574 268 419 316 256 145 
(N = 39) (a) 6.2 5.4 9.6 9.2 8.0 6.2 








** The difference between adjacent means is significant at the 1% level of confidence. 


occupation with the subject matter in the amount of such change depend upon the 
characteristic areas. In addition, it is highly type of work desired by the individuals seek- 
probable that the results reflect a slanting of ing new occupations. Engineers typically 
responses toward the desired occupational have predominant interests on the Mechanical 
choice. Applicants for jobs are apt to alter and Scientific scales with an average Persua- 
or modify their responses to produce what _ sive interest. For those experienced engineers 
they feel are the desirable interest patterns. desiring sales work, the dominant interest is 

A change of vocational goals is reflected in the Persuasive area. A similar shift to- 
in the Kuder interest scores. The nature and ward Persuasive interest occurs for production 
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managers desiring other types of jobs. Al- 
most half of the “other interest” group were 
sales applicants and their weight in the group 
accounted for the significant change in this 
scale. Contrariwise, experienced salesmen 
seeking employment outside of sales work 
show lower scores in Persuasive interests. 
This trend is also observed when comparing 
laborers with machinist ambitions with labor- 
ers content to fill laboring jobs. In this in- 
stance the groups are decidedly differentiated 
in the expected area of Mechanical interest. 

The explanation of consciously biasing re- 
sponses toward the desired area mentioned 
before would apply more specifically to the 
“other interest” groups. The reason for de- 
siring to change job areas is not known. 
Dissatisfaction with their present vocation 
could possibly have been the deciding factor 
with many persons in the “other interest” 
groups. Nevertheless, the factor of other job 
desires does become an important variable 
in the selection of samples for occupational 
norms. 


Summary 


1. This study was designed to determine 


the similarity of Kuder interests between: 
(a) entry and experienced workers; and (b) 
experienced workers and experienced workers 
with new occupational goals. 

2. The interests of the entry groups are 
basically similar to those of experienced per- 


sons in the same occupation. The differences 
found are in the direction of higher scores for 
the entries on scales typical of the occupa- 
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tional area. This similarity lends validity 
to the practice of using interest profiles based 
on experienced workers for vocational coun- 
seling. 

3. It has been shown that Kuder interest 
scores of persons seeking employment in a 
new area differ from persons in similar oc- 
cupations who choose to remain in their pres- 
ent vocational field. The particular scale in 
which the differences occur follow the type of 
work to which the change is being made. 
Though definite conscious slanting of test re- 
sponses occurs in a situation in which em- 
ployment is involved and the reason for many 
of the job changes may have been dissatisfac- 
tion with their present type of work, the dif- 
ferences found do suggest the importance of 
no other job interest as a criterion in the selec- 
tion of samples for determining occupational 
interest norms. 
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Effect of Viewing Angle and Parallax upon Accuracy of Reading 
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An important condition which affects the 
accuracy of reading instruments that present 
quantitative information is the orientation 
of the instrument with respect to the ob- 
server's line of sight. When an instrument 
is displaced laterally from the point im- 
mediately in front of the observer, the view- 
ing angle is decreased, and if the pointer and 
the plane of the dial are also displaced paral- 
lax is introduced. It is well known that de- 
creases in viewing angle and the introduction 
of parallax affect reading accuracy. Manu- 
facturers of precision instrument scales take 
considerable pains to eliminate these factors 
on scales which are to be read to close tol- 
erances. In more common situations, how- 
ever, where several instruments are displayed 
on a flat panel, as in aircraft, it is not feasible 
to construct instruments with precise pointer- 
locating devices on them, such as mirrors, etc. 
The usual design practice in such situations 
has been to restrict the location of instru- 
ments that require great reading accuracy to 
the center of the instrument panel, thus avoid- 
ing the problem of parallax for at least some 
instruments. 

Whether this latter practice is necessary 
or desirable has been the topic of discussion 
by several investigators concerned with in- 
strument dial legibility and design practice. 
Recommendations by Calvert (3) and Du 
Bois (4) suggest that when aircraft instru- 
ments must be displaced laterally to the ob- 
server’s forward line of sight, part or all of 
the instrument panel (or each instrument 
dial face) be tilted so the dial faces are per- 
pendicular to the line of sight. That this 
kind of arrangement creates new difficulties 


* The experiments reported here were performed at 
Antioch College, Yellow Springs, Ohio under Air 


e Force Contract No. AF 18(600)-50 
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has been pointed out by Barr (1), who, while 
agreeing that readability could be increased 
by tilting the dial faces or curving the instru- 
ment panel, mentions that space requirements 
often make it impossible, lighting and reflec- 
tion problems are more difficult, and new 
hazards are created (e.g., fouling during 
emergency escape procedure). Kappauf (6), 
however, suggests that for instrument panels 
where space limitations and other factors 
are not a serious consideration, panel shape 
and instrument dial orientation might well be 
a subject of careful study. 

It appears to be generally agreed that ex- 
cessively oblique views of instrument dial 
faces create serious reading errors and that 
such situations should be avoided in the 
design of instrument panels. But while these 
conceptions, based upon experience with dial 
and panel design, are sound, very little em- 
pirical evidence or theory exists to define pre- 
cisely the limits of reading accuracy which 
might be expected as a function of viewing 
angle and parallax. Some data on the subject 
have been accumulated by Bartlett and Mack- 
worth (2) in a study of errors made in locat- 
ing aircraft position when represented on a 
plotting board grid. These investigators col- 
lected extensive data on the number of gross 
location errors made when the plotting board 
was seen from various viewing angles and 
distances. But while these data suggest that 
decreasing viewing angle beyond a critical 
point (about 35 degrees) results in gross 
errors, the results are not directly applicable 
to an instrument reading task, due to the 
nature of the apparatus used and the condi- 
tions of their experiment. 

It would appear from the Bartlett and 
Mackworth data that reading accuracy can 
be expected to decline systematically as a 








Effect of Viewing Angle 


DIAL TYPE |! 
600 X 10 


Fic. 1. 


function of decreasing viewing angle and that 
serious limitations on instrument location need 
to be imposed if reading accuracy is not to 
suffer. It appears desirable also to determine 
what changes in accuracy could be expected 
in the more special case of instrument dials, 
in order to discover if any generality exists 
for present empirical findings or for certain 
postulated invariants. It is the purpose of 
this paper to report on two exploratory 
studies designed to determine the changes 


in reading accuracy which might be expected 
to occur as a function of decreasing viewing 
angle and the introduction of parallax. 


The Experiments 


The first experiment was designed to deter- 
mine the effects of changes in viewing angle ' 
upon reading errors without parallax effects 
entering into the situation. Photographs of 
dials were used to rule out the effects of 
parallax. The photographs were presented 
in a tachistoscope and read at various view- 
ing angles from 90 degrees to 25 degrees. A 
total of 20 college students were used as 
subjects in the experiment. All had normal 
Snellen acuity and none had obvious visual 
defects. 

The apparatus consisted of a sliding mirror 
tachistoscope with a switching mechanism so 
that the subject could control the exposure 
time. The back of the tachistoscope was 
arranged so that the photographs could be 

1 By viewing angle we mean the acute angle formed 


by the intersection of the plane of the dial face and 
the observer's line of sight. 


and Parallax 


DIAL TYPE 2 
400 x 10 


Dial types used in experiment I 


tilted either horizontally or vertically and 
presented at all viewing angles between the 
limits used. Ten of the subjects read the 
dials tilted horizontally, and ten read them 
tilted vertically. Viewing distance was 28 
inches, and the brightness of the white parts 
of the dials was seven foot-lamberts. 

Two kinds of dials, as shown in Figure 1, 
were used in the experiment. One was a 
600 X 10 dial, and the second was a 400 » 
10 dial. Each subject was given ten practice 
trials to familiarize him with the dials and 
the apparatus. He was instructed to read the 
dials to the nearest five units as accurately 
and quickly as possible. Each subject was 
given 40 test trials, 20 on each of the two dial 
types, four at each of ten viewing angles used. 
Five of the pointer settings for each dial type 
were presented in each of the quadrants of 
the circle. For each dial, half of the settings 
were nearest a graduation mark, and_ half 
were nearest a mid-mark position. On each 
trial the subject pressed a switch, opening 
the shutter and exposing the dial. When he 
had read the setting, he released the switch, 
closing the shutter, and reported the apparent 
setting to the experimenter, who recorded the 
setting report and the time. 

Results of the First Experiment. Analyses 
of the data revealed no systematic change in 
reading time as a function of viewing angle 
within the limits studied for either dial type 
or direction of slant. A fairly systematic 
trend exists, however, for errors with de- 
creasing viewing angles. A graph of the per- 
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50° 40° 30° 


VIEWING ANGLE 


Fic. 2. 


centage of the total readings in errer by five 
or more units made at each viewing angle by 
all the subjects is presented in Figure 2. The 
trend for errors to increase with decreasing 
viewing angles is represented by a cosecant 
function. This trend might also be repre- 
sented equally well by a straight line or many 
other possible functions; the reason that a 
cosecant curve was used will be discussed 
later. The data for both dials and both 
directions of slant are combined, since there 
were no real differences between the two dials 
or directions. 

Before discussing the results of the above 
described experiment, let us first turn to the 
second experiment, on the effects of parallax. 
The second experiment was designed to iso- 
late, if possible, the effects of the introduction 
of parallax? in various amounts upon errors 
in pointer location. It was felt that the ef- 
fects of parallax could be studied best in a 
situation which was fairly simple and_ in 
which errors as interpolation, numeral identi- 
fication and other factors were not present. 
Apparatus was therefore designed so that the 
subjects could align a pointer with a single 
mark against a homogeneous background, on 
the assumption that the setting errors thus 
measured would reflect the errors in perceived 
location. 

The apparatus for this experiment con- 
sisted of a white cardboard upon which was 


2 By parallax we mean the amount of displacement 
of the pointer from the plane of the dial face. 


Per cent readings in error by 5 units or more at each viewing angle. 


Experiment I 


painted a single black line three inches high 
by one-eighth inch wide. A black pointer 
of similar dimensions was set in front of the 
mark on a track to permit the pointer to slide 
laterally to various positions in front of the 
mark and background. The subjects con- 
trolled the position of the pointer by pulling 
alternately on a pair of strings. The experi- 
menter could read the position of the pointer 
from behind the apparatus by referring to a 
meter stick calibrated to the pointer position. 
A white cardboard screen allowed the subject 
to see only the background, black mark and 
pointer. 

Seventeen college students acted as sub- 
jects. They were instructed to set the pointer 
on a line perpendicular to the plane of the 
mark and in line with the mark. The panel 
was set at various viewing angles between 90 
degrees and 20 degrees inclusive, both left and 
right. Pointer and mark were displaced by 
distances of 144, 4, 1, and 11% inches. View- 
ing distance was ten feet. Two settings, left 
and right, were made by each subject at each 
of nine viewing angles, left and right, and at 
each of the four displacements, a total of 136 
settings per subject. Sequences of presenta- 
tion of the viewing angles, displacements, and 
left and right settings were randomized for 
each subject. 

Results of the Second Experiment. A pre- 
liminary analysis of the error data revealed 
that the distributions of errors at each viewing 
angle were neither normal nor homogeneous 
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tance was 10 feet. 


with respect to variability. The distributions 
were symmetrical at the 90 degree position, 
but as viewing angle was decreased the dis- 
tributions became both more skewed and 
more variable. For this reason the medians 
were used rather than the means as average 
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measures of errors at each viewing angle and 
displacement. 

Figure 3 represents the data of the experi- 
ment, and is plotted in terms of the median 
constant error (in which the direction of 
error is considered) as a function of viewing 
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angle, for the four pointer-mark displace- 
ments used. The zero point indicates correct 
alignment, positive values indicate settings 
in the direction away from the observer, and 


negative values indicate settings toward the 
the observer, relative to correct alignment. 
It can be seen from Figure 3 that a trend 
exists here similar to that found in the first 
experiment, namely, increasing error with de- 


creasing viewing angle. In addition, there 
appears to be an inverse relation between 
constant errors and displacement, the small 
displacements yielding apparently greater 
positive constant errors, while the large dis- 
placements yield small or negative constant 
errors (in the direction of the observer). 

A similar trend appears for errors to in- 
crease as a function of viewing angle if the 
median average error (in which direction is 
not considered) is plotted. Figure 4 con- 
tains a graph of these data plotted for each 
of the four displacements. The relation of 
amount of displacement to error is not so 
clear here as it was for constant error. The 
amount of the error as indicated by the 
median constant error, appeared to be in- 
versely related to the amount of displacement. 
In Figure 4, it appears that the greater the 


Median average error for all pointer-mark displacements at each viewing angle. 


Viewing distance 


displacement, the greater is the error vari- 
ability, as measured by the median average 
error. 

If all the data are combined and the median 
average error is plotted as a function of view- 
ing angle for all displacements combined, 
the data as shown in Figure 5 appear. The 
curve fitted to these data is again a very close 
approximation to a cosecant function; it ap- 
pears to fit the data quite well and in about 
the same way as in the first experiment. 


Discussion 


There appear to be at least two effects in- 
volved in the experimental data. The first 
may be termed the effect of decreased ap- 
parent distance between successive points on 
the scale as a function of decreasing viewing 
angle; the second appears to be the effect of 
displacement between the pointer and scale 
plane. Apparently, for the limits of this 
study, large pointer displacements lead to 
small constant errors but large variable errors, 
while small displacements lead to large and 
consistent constant errors. 

These results may be interpreted, at least 
in a preliminary way, by considering the 
visual angle relationships which vary con- 





Effect of Viewing 


comitantly 
A decrease 


with changes in viewing angle. 
in viewing angle results in a cor- 
responding decrease in the visual angle sub- 
tended at the eye by a given mark separation. 
It was noted earlier that in Figure 2 the data 
were represented by a cosecant function. 
They might have been fitted by a straight 
line. However, since the projection of the 
mark separation distance decreases propor- 
tionally to the sine of the viewing angle, we 
would expect errors to be inversely related to 
the sine, or directly proportional to the cose- 
cant, of the viewing angle. At zero degrees 
viewing angle, of course, the cosecant function 
is infinite, and we would also expect errors 
to be extremely large or erratic, since the 
mark separation, as projected on the retina, 
would be zero, and the dial face would prob- 
ably not be visible. The interocular distance 
is purposely ignored in the geometrical 
analysis. 

It would be expected from this interpreta- 
tion that errors, in reading to a given criterion 
of accuracy, would remain substantially neg- 
ligible until the visual angle subtended by the 
criterion distance approached or diminished 
below some minimum discriminable angle. 
To compensate for this decreasing visual 
angle, it may be necessary, in a dial reading 
situation, only to increase the mark separa- 
tion so that the visual angle subtended by the 
distance representing criterion error tolerance 
remains above the minimum discriminable. 

As an example of this kind of interpreta- 
tion, we may consider a situation in which a 
normal observer is required to read quanti- 
tative scales. If the illumination is good, and 
if it is assumed that the accuracy of the 
readings does not require discriminations 
finer than about one minute of visual angle, 
the observer could discriminate points as 
separate if they are 0.008 inch apart at a 
viewing distance of 28 inches. Thus, under 
the conditions of the first experiment, since 
the mark separations were about 0.125 inch, 
the minimum discrimination necessary (for 
accuracy to the nearest five units) was about 
twice that of which the normal observer is 
capable. Rationally, gross increases in error 
frequency under these conditions would not 
be expected until the viewing angle was de- 
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creased to a value of about 30 degrees. This 
expectation is borne out reasonably well by 
the data. Presumably, under levels of il- 
lumination which are below that required for 
a discrimination of one minute of visual angle, 
a Correction would be necessary for the de- 
crease in acuity at the lower level. 

It should be clear, of course, that the above 
interpretation may be applied only when 
parallax is not present (i.e., when scale marks 
and pointer are not displaced). It was seen 
from the results of the second experiment that 
different amounts of parallax lead not only 
to different constant errors (both in direction 
and amount), but to different variability as 
well. It can be seen from an examination of 
Figure 3 that there was a consistent tendency 
for the subjects to set the pointer too far 
away, to “overcompensate,” for the amount 
of parallax present and that this tendency 
became more pronounced with decreasing 
viewing angle. Tentatively, a definition has 
been constructed for parallax in terms of the 
visual angle subtended at the eye by the 
pointer-mark Cisplacement distance, and a 
theoretical function has been derived to ac- 
count for these kinds of errors. A study is 
planned to test this theoretical function in a 
scale reading experiment. It is hoped that 
such an approach will serve a two-fold pur- 
pose of providing a basis upon which to pre- 
dict the errors to be expected in a practical 
sense, and at the same time provide a better 
understanding of the functional relationships 
involved. 


Summary 


Two experiments were performed to eval- 
uate and quantify the effects of decreased 
viewing angle and parallax upon accuracy of 


reading instrument scales. In the first ex- 
periment, viewing angles were varied, and 
subject-controlled tachistoscopic presentation 
of the stimulus dials was used. Dial photo- 
graphs were used as stimulus materials to 
isolate effects of viewing angle from those of 
parallax. The results show that reading 
errors increased as viewing angle decreased 
from 90 degrees to 25 degrees. Reading time 
was unaffected by changes in viewing angle. 

In the second experiment the effect of 
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parallax was studied by requiring the sub- 
jects to align a movable pointer with a mark. 
The apparatus was set at viewing angles be- 
tween 90 degrees and 20 degrees, and four 
pointer-mark displacements were used. The 
average error of the settings increased as the 
viewing angles decreased. The increase is 
approximated by a function proportional to 
the cosecant of the viewing angle. The con- 
stant error tends to increase systematically 
with viewing angle. With increasing pointer- 
mark displacement the average error tends to 
increase, but the constant error is inversely 
related to the amount of pointer-mark dis- 
placement. 

The error curves in both experiments are 
approximated by a function proportional to 
the cosecant of the viewing angle. An inter- 
pretation in terms of a least discriminable 
visual angle is advanced to account for the 
results. 
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Visual Tracking: III. 


The Instrumental Dimension of Motion 


in Relation to Tracking Accuracy ' 


Robert S. Lincoln 
The Johns Hopkins University 


In the typical tracking task a human opera- 
tor is required to make corrective responses 
to visual cues provided by the relative posi- 
tions and rates of travel of a target and a 
cursor or follower. With modern equipment 
the operator seldom manipulates the cursor 
itself. Rather he effects his control through 
a mechanical or electrical device that links 
him to the controlled cursor. This connect- 
ing mechanism usually alters the relationship 
between the operator’s controlling motions 
and the actual movements of the cursor. The 
nature of the alteration is determined by the 
characteristics of the tracking instrument. 
Systematic variation of these characteristics 
defines the instrumental dimension of track- 
ing motions. 

Within this dimension three different types 
of instrumental alteration of motion are of 
special interest since all of them have been 
incorporated in remote-control tracking equip- 
ment. These alterations may be termed: 
(a) translations, (b) transformations, and 
(c) integrations of motion. Translated mo- 
tions directly reflect the characteristics of the 
motions made by the operator, but the trans- 
lated motions are either amplifications or 
reductions of the operator’s motions. Trans- 
formed motions reflect (ie operator’s move- 
ments only in a special way since the system 
output is not a direct counterpart of the mo- 
tion input (2). In the tracking device con- 
sidered in this paper, the transforming instru- 
ment changes an input of extent of move- 
ment into an output of velocity of movement. 


1 This report is based on a dissertation submitted 
in partial fulfillment of the requirements for the de- 
gree of Doctor of Philosophy at the University of 
Wisconsin in 1952. The research was conducted un- 
der the direction of Dr. Karl U. Smith to whom the 
writer is greatly indebted. Support for the research 
was provided by the Research Committee of the 
Graduate School from special funds voted by the 
Legislature of the State of Wisconsin. Reported at 
the 1953 meeting of the Eastern Psychological Asso- 
ciation. 


Integrations of motion involve the combina- 
tion into one output of a simultaneous trans- 
lation and transformation of the same move- 
ment of the operator. 

The various instrumental alterations of mo- 
tion that have been described are produced 
by direct, velocity, and aided tracking sys- 
tems, respectively. Table 1 indicates the 
component motions required to achieve con- 
trol of the position and rate of travel of the 
cursor in each of the three types of tracking, 
and the alterations of these motions that are 
produced by the different tracking systems. 

As Table 1 shows, direct tracking involves 
two translations of motion while the velocity 
tracking mechanism produces two transforma- 
tions of motion. In aided tracking an in- 
tegration of the simultaneous translation and 
transformation of the same positioning motion 
is developed by the tracking device. 

This study was designed to provide in- 
formation related to three main questions: 


Table 1 
Instrumental Alterations of the Operator Motions 


Required to Achieve Control of the Position 
and Rate of Travel of the Cursor 


Type of 


Tracking Positioning Motion Rate Motion 


Direct Translat-" ‘ntocursor Translated into rate 


positioning of cursor travel 


Transformed into di- 

rection of cursor travel 
Velocity 

Transformed into rate 

of cursor travel 


Translated into cursor 
positioning 


Aided Transformed into rate 
of cursor travel—the 
translation and trans- 
formation are inte- 


grated. 
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1. Are the characteristics of the curve of 
skill acquisition in tracking changed by the 
different instrumental alterations of motion? 

2. Are the effects of practice sufficient to 
overcome differences that may exist in the 
difficulty of operation of contre] systems that 
produce translations, transformations, or in- 
tegrations of control movements? 

3. What transfer effects appear when train- 
ing on one type of tracking is followed by 
transfer to another type? 


Apparatus and Procedure 


The apparatus used in this study has previ- 
ously been described in some detail (3, 4, 5). 
The operator’s task is to align a cursor and a 
moving target by means of a handwheel control. 
The target moves over a circular course that in- 
cludes numerous reversals in the direction of 
target movement and continuous changes in target 
velocity. 

In order to compare the levels of accuracy that 
are obtained with direct, velocity, and aided con- 
trols, a special device has been constructed that 
permits a rapid change from one type of tracking 
to another without the alteration of other criti- 
cal features of the’ tracking task. 

Tracking-accuracy scores are obtained with a 
mechanism which integrates the tracking error 
record and provides a summated accuracy score 
on the dial of an electric clock. 

Prior to the experimental comparison of the 
different types of tracking, data were obtained on 
the optimum ratios of handwheel-to-cursor dis- 
placement (4). These optimal ratios were used 
in the comparison of the three types of tracking 
in order to insure that obtained differences in ac- 
curacy levels would not be a reflection of arbi- 
trarily chosen displacement ratios. For all ratios 
used in aided tracking, an aided tracking time 
constant of 0.5 second was maintained (6). 

Three groups of 18 subjects each were used in 
the study of training and transfer of training ef- 
fects. Subjects were randomly assigned to train- 
ing groups, and a different group of subjects was 
trained on each of the three types of tracking. 
The training sessions extended over a period of 
six successive days. On each training day all 
subjects received ten one-minute tracking trials. 
A 25-second rest pause was permitted between 
trials. 

Before the training trials were begun, all sub- 
jects were given an explanation of the tracking 
task and of the mechanical features of the con- 
trol that they would use. A brief demonstration 
of the control was also provided. This same ex- 
planation and demonstration was given to sub: 
jects who transferred to a new type of tracking 
during the transfer trials. Subjects received no 
information concerning their accuracy in track- 
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ing other than that provided by the visual dis- 
play of the apparatus. 

The effects of transfer were observed on the 
seventh day of the experiment. At this time six 
subjects from each of the training groups re- 
ceived ten trials on one of the two types of 
tracking on which they had not been trained. 
Six different subjects from each of the training 
groups received ten trials on the second of the 
two types on which the groups had received no 
training. The remaining six subjects in each 
training group continued with the type of track- 
ing on which they had been trained. These con- 
trol subjects made up the additional-practice 
groups. A matching procedure was used to 
equate the various transfer and control groups 
on the basis of their accuracy scores achieved 
during the training period. 


Results 


Results of Practice. Performance curves 
for all training groups are shown in Figure 1. 
In this figure mean accuracy scores are plotted 
for each trial. The training curves indicate 
that those subjects who trained on direct 
tracking made the highest mean accuracy 
score on every trial throughout the entire 
training period. Those subjects who trained 
on velocity tracking made the lowest accuracy 
scores. The accuracy of aided tracking con- 
sistently fell between these two extremes al- 
though, after the first few trials, aided ac- 
curacy closely approached direct accuracy. 

From these results it is apparent that the 
mechanical devices developed as an aid to the 
tracking operator are of no general value 
under the present experimental conditions. 
It is quite possible, however, that different 
results might be obtained with a more uni- 
form target course, or in situations that re- 
quire the operator to track continuously for 
long periods of time. 

In order to evaluate the significance of 
some of the characteristics of the practice 
curves, a test of trend (1) was applied to the 
training data. In this test the mean accuracy 
scores for days, rather than trials, were used. 
Before the test of trend was carried out, it 
was necessary to perform an arc sin trans- 
formation (7) of the mean scores after each 
mean score had been calculated as a_per- 
centage of the maximum possible accuracy 
score. This procedure was required to reduce 
a negative correlation between the means and 
variances of the training groups. 
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The levels of accuracy reached by groups of subjects who 
trained on direct, aided, or velocity tracking 


The vertical lines indicate 


the blocks of ten trials which made up a day’s run. 


According to the trend test, the training 
curves show significant (p < .001) deviation 
from linearity. The curves for the separate 
groups also differ significantly in regard to 
the degree with which they depart from line- 
arity. In addition, the slopes of best-fitting 
straight lines differ between groups. 

Results of Transfer. The data obtained 
during the transfer trials were subjected to 
an analysis of variance. Before the analysis 
was begun, the accuracy scores were trans- 
formed in the same manner as the training 
scores. The analysis of variance showed all 
of the main sources of variation to be sig- 
nificant (p< .001), but a significant inter- 
action (p < .001) between training and trans- 
fer types of tracking greatly modified the 
importance of the main variables. This re- 
sult indicates that the effects of training are 
highly specific in nature since no type of 
training led to superior over-all performance 
when transfer was made to all three types of 
control. Rather the effects of training de- 
pended upon the type of tracking to which 
transfer was made. 


As has been pointed out, direct tracking 
involves two instrumental translations of the 
operator’s control motions, while the velocity 
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Positive and negative transfer effects. 


The zero point on each graph 


represents the initial accuracy level achieved by untrained subjects on the transfer 


types. 


Plus deviations from the zero line indicate positive transfer effects, while 
minus deviations indicate negative transfer effects. 


All graphs show the type and 


amount of transfer effect produced when subjects are trained on direct tracking 
(D), velocity tracking (V), or aided tracking (A), and later transfer to the same 


or a different type of tracking. 


mechanism produces two transformations of 
the operator’s motions. Aided tracking in- 
volves one translation and one transformation 
of motion. Direct and velocity tracking, 
therefore, have no common instrumental re- 
lationships while aided tracking has one rela- 
tionship in common with both direct and 
velocity tracking. Figure 2 shows that the 
amount of transfer, as measured by accuracy 
scores, is directly related to the number of 
instrumental relationships that are common to 
both the training and transfer tasks. The 
figure indicates (for example) that training 
on aided tracking led to greater accuracy 
upon transfer to the velocity control than 
did training on direct tracking. A similar 
interpretation may be applied to the other 
points on the curves. 

In Figure 2 the additional-practice groups 
are shown as having “transferred” to the same 
type of control as the one on which they had 


trained. The relative positions of these 
groups indicate that direct tracking was still 
slightly superior to aided tracking on the 
séventh day of the experiment. 

Another suggestion of transfer effects may 
be obtained from a comparison of the ac- 
curacy scores achieved by untrained subjects 
on a given type of tracking with the scores 
made by subjects who transfer to that type 
following training on another type. Figure 3 
pictures this kind of transfer effect. The zero 
point on the ordinate of each graph repre- 
sents the mean score made on the first ten 
training trials by the 18 subjects who trained 
on each of the transfer types. The remaining 
ordinate values indicate the amount and di- 
rection of the differences between the mean 
scores for the untrained subjects and mean 
scores made by the six subjects who trans- 
ferred to the different transfer types after 
training on either direct, velocity, or aided 
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tracking. In Figure 3 the additional-practice 
groups are again shown as having “trans- 
ferred” to the type of control on which they 
were trained. 

The significance of these transfer effects 
was evaluated by means of ¢ tests that were 
performed on untransformed accuracy scores 
after F tests had indicated the homogeneity 
of the variances of the different training and 
transfer groups. All transfer effects are sig- 
nificant (p < .05) with the exception of that 
effect produced by transfer to velocity track- 
ing after training on direct tracking. 

These results indicate that the prediction 
of transfer effects must take into account the 
direction of transfer as well as the num- 
ber of instrumental relationships common to 
both the training and transfer tasks. Train- 
ing on direct tracking produced a positive 
effect upon transfer to the aided control and 
no significant effect upon transfer to the 
velocity control, while training on either 
velocity or aided tracking produced a nega- 
tive effect upon transfer to the direct control. 


Summary and Conclusions 


This study was designed to provide in- 
formation concerning the acquisition and 
transfer of skill in the operation of remote 
control devices which produce instrumental 
translations, transformations, and _integra- 
tions of the operator’s controlling motions. 
These instrumental ‘diterations of response are 
produced by direct, velocity, and aided track- 
ing systems. 

Each of three groups of 18 subjects re- 
ceived training on either direct, velocity, or 
aided tracking for a period extending through 
six successive days. On the seventh day of 
the experiment 12 subjects from each train- 
ing group transferred to different types of 
tracking while the remaining six subjects in 
each group continued to track with the con- 
trol on which they had been trained. Ac- 
curacy scores achieved by the subjects were 
analyzed with regard to the effects of both 
practice and transfer. The results of the ex- 
periment suggest a number of conclusions. 

1. The instrumental characteristics of con- 
trol devices are prime determinants of the 
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accuracy with which those devices may be 
operated. 

2. Practice curves for the three types of 
tracking show the typical negative accelera- 
tion which has previously been demonstrated 
in studies of direct tracking behavior. 

3. For complicated target courses, the ac- 
curacy of direct tracking is consistently su- 
perior to both aided tracking and velocity 
tracking. Aided control is also far superior 
to velocity control. 

4. Within the limits of this experiment, the 
effects of practice are not sufficient to elimi- 
nate the differences in accuracy achieved with 
the three types of tracking. 

5. The effects of training are highly spe- 
cific in nature. The best performance in 
transfer to any type of tracking is achieved 
by subjects who are trained on that specific 
type. 

6. Negative transfer effects appear when 
subjects transfer from aided or velocity track- 
ing to the direct control, while positive trans- 
fer effects appear in the ¥everse- situation. 
The amount of transfer is directly related to 
the number of instrumental relationships that 
are common to both the training and transfer 
tasks. 


Received January 12, 1953. 
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Identification of Cola Beverages Overseas * 


E. Terry Prothro 
Brooklyn College 


Bottling and sale of cola beverages is now 
taking place in many countries around the 
world, and consumption of these beverages 
has become a part of the life of inhabitants 
of every continent. Both Coca-Cola and 
Pepsi-Cola are widely sold in Lebanon, for 
example, and their popularity has stimulated 
the production of several imitations. It there- 
fore seems worthwhile to determine whether 
or not consumers can differentiate the Ameri- 
can beverages from each other and from the 
local colas on a basis of taste. 

A series of investigations by Pronko and 
others indicated that subjects could not 
identify American colas better than chance 
when many different brands were presented 
(1), but that they could identify Coca-Cola 
significantly more often than chance when 
only the three leading brands were used (2). 
It was therefore decided to use only three 
beverages, including Coca-Cola, in this in- 
vestigation, so that there would be maximum 
opportunity to reveal taste differences be- 
tween the beverages. 


Procedure 


The three leading cola drinks of Lebanon 
were used in this study. These three, in 
order of popularity are Coca-Cola, Pepsi- 
Cola, and Williams Champagne Cola. The 
American colas are bottled in local plants but 
according to a special and presumably secret 
process dictated by the parent corporations. 
Champagne Cola is produced by the Williams 
plant, a Lebanese organization which also 
produces many other soft drinks. 
resembles the American drinks in appearance. 
It was introduced after the early success of 
Coca-Cola. Coca-Cola has been distributed 
there for nearly three years, Pepsi-Cola for 
six months, and Williams Cola for more than 
one year. 

* This study was conducted overseas while the au- 
thor was teaching at the American University of 
Beirut. 


This cola . 


A total of 60 students of the American Uni- 
versity of Beirut volunteered to serve as sub- 
jects. Each subject stated that he was fa- 
miliar with the taste of the beverages used, 
that he was not suffering from a cold, that 
he had no political or religious objection to 
any of the beverages. He was then told that 
he would be given each of the three colas in 
series, and that he was to identify each after 
its presentation. Approximately 2 oz. of 
refrigerated cola were used at each presenta- 
tion. The beverages were presented in iden- 
tical 6 oz. glasses. Subjects were blindfolded 
during the trials. Approximately one minute 
elapsed between each trial, during which time 
the subjects were asked to rinse their mouths 
with water. There are six possible arrange- 
ments when three colas are presented in series. 
Each of the six arrangements was used for 
ten subjects. 


Results 


Although the subjects were informed that 
each of the three colas would be presented, 
some of them felt that instructions in a psy- 
chological experiment cannot be relied upon. 
One subject believed that a single cola was 
presented three times, and some believed that 
other colas were being presented. From Table 
1 it can be seen that the most recently intro- 
duced cola (Pepsi-Cola) was named most 
often. This fact may be a result of the ex- 


Table 1 
Identification Responses of Subjects when Presented 
with Three Cola Beverages 


Response 


Cham 
Cola Coca Pepsi- — pagne 
Presented Cola Cola Cola Other = Total 
Coca Cola 24 30 : & oO 
Pepsi ' 26 28 5 1 60 
Champagne 1 6 51 2 60 
Total 51 64 61 4 180 
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tensive advertising campaign that accom- 
panied its introduction. 

Our subjects were not able to differentiate 
between the two American colas. Indeed 
Coca-Cola was called Pepsi-Cola more often 
than it was named correctly. On-the other 
hand, the subjects did identify the local cola 
quite well, and showed little tendency to con- 
fuse it with the American colas. If we employ 
the chi-squared test of significance, we find 
that the American colas are identified cor- 
rectly only slightly, and insignificantly, more 
often than chance. The correct identification 
of Williams Champagne Cola was not at- 
tributable to chance. The superiority to 
chance was significant at the .001 level. 


Summary 


A total of 60 students in American Univer- 
sity of Beirut were asked to identify Coca- 
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Cola, Pepsi-Cola, and a popular local cola 
without reliance on visual stimuli. Although 
these colas are widely distributed locally, and 
the subjects stated that they were familiar 
with them, it was found that the American 
colas could not be differentiated from each 
other. The local product could however be 
distinguished from the American brands in 
spite of the fact that it is an imitation of 
them. 


Received March 3, 1953. 
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Applied Psychology in Action 


The Non-Directive Approach in Advertising Appeals 


Howard D. Hadley 


Danie’ Starch and Staff, Mamaroneck, N. Y. 


In recent years a new basic approach to 
psychotherapy has been developed. It is 
called the non-directive method. Because it 
has some implications for advertising, there 
may be value in describing this technique 
along with its counterpart—the directive 
method. 

Non-directive psychotherapy is built around 
these central concepts: 

1. That all individuals have the basic ca- 
pacity to understand the forces in their lives 
which cause them unhappiness and _ pain. 
Moreover, they also can understand those 
forces which lead to pleasure and well-being. 

2. That all individuals, by personal effort, 
ultimately can overcome the bad or enhance 
the good forces in their lives. 

3. That this process is made easier, quicker, 
or more effective in an atmosphere that is 
friendly, sincere, and understanding. 

In non-directive therapy, the therapist does 
not assume any of the usual responsibilities 
such as prescribing treatment or even directly 
defining the cure. Instead, the therapist at- 
tempts to set up with the individual a rela- 
tionship, an atmosphere, in which the person 
may talk or act without danger of being cri- 
ticized. It is one of complete acceptance. 
Within this tender environment, the person 
himself comes to understand and to re-evalu- 
ate the forces operating to make him happy or 
unhappy. 

Directive therapy differs in that it usually 
requires a direct assault upon the individual’s 
maladjustments: the person is tested, ana- 
lyzed, and then told what is wrong and given 
a prescription for a cure. It is similar to 
going to a doctor only to find out you have 
appendicitis. He puts you in a hospital, re- 
moves the diseased organ, and your body 
then is able to complete the recovery. When 
dealing with the physical body, the doctor’s 
obligation is often greater than the patient’s. 


When dealing with a person’s mind, this 
method sometimes falls down because the per- 
son is reluctant to believe or is unable to 
accept the diagnosis of the therapist. The 
therapist can only discover and point out the 
path, he cannot walk it. Many persons bene- 
fit from this advice but many others do not. 

In advertising it is very similar: there are 
two ways in which to advertise. The adver- 
tiser can fell the person to buy the product 
because it will do this or that for him. This 
is comparable to the directive method where 
the individual is told what his troubles are and 
how to cure them. Or the acvertiser can 
create a friendly, sincere, and understanding 
atmosphere which shows the benefits of the 
product without direct intention to sell. This 
places the person in a situation where he is 
able to accept new ideas without threat to his 
old ideas. 

To complete the comparison, directive 
therapy is similar to the direct appeal in ad- 
vertising. In each case, the person to be 
influenced is directly told what is wrong and 
how to correct it. The non-directive therapy 
can be compared with the inferred appeal in 
advertising. 

The inferred technique usually utilizes as- 
sociation of the product with very acceptable 
things, persons, or events. The acceptability 
of the associated “thing” creates an attitude 
in which the individual feels free to accept 
new ideas. Also, the acceptability of the 
“thing” is transferred to the product. Ex- 
amples of advertisers using the inferred appeal 
are Modess, John Hancock, Breck, Old Gold, 
and Seagram’s 7 Crown. Each of these adver- 
tisers avoids a direct assault upon the con- 
sumer’s credibility but disarms him and then 
introduces a strong simply-worded message. 

Here are some examples from current ad- 
vertising which help to illustrate the compari- 
son. Take the Philip Morris theme, “Some- 
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thing Wonderful Happens—.” Here is an at- 
titude that some persons actually attain after 
changing to Philip Morris. At the present 
time, this very personal theme is put across in 
a very direct manner. Readers are now being 
told that “Something Wonderful Happens 
when they change to Philip Morris.” It is 
entirely possible that the theme—which is 
excellent—might be more effective if readers 
were to arrive at this realization without such 
obvious aid. To be told about this antici- 
pated change in the personality tends to act 
as a threat, which leads to resistance and 
withdrawal. The point could be put across 
showing this “Something Wonderful” hap- 
pening to others and then associating it in- 
directly with Philip Morris. 

Basically, persons earnestly want to be- 
lieve advertising, but they are afraid to. This 
fear is the result of the possible “frustration” 
they may suffer because of false and mislead- 
ing advertising when the product fails to live 
up to a person’s expectations. If this happens 
with advertising which uses the direct appeal, 
the consumer blames the advertiser. If this 
failure occurs with a product which has used 
inferred appeals, the consumer is less apt to 


complain 1) because of lack of specific prom- 
ises in the advertising and 2) because he was 
the one who decided what the product would 
do—and not the advertiser. 

To remain in the cigarette field, let’s take 


the king size cigarettes. A person who uses 
the regular length cigarettes is told that he 
is not being economical and is also remiss in 
attention to his health. While both of these 
points have the popular vote behind them, a 
person will tend to treat the advertising as a 
threat to his judgment. Such advertising is 
a negative approach to a negative appeal. 
The inferred (non-directive) appeal would 
show a person enjoying increased smoking 
pleasure from a longer smoke. 

Throughout this whole comparison, there 
are two distinct philosophies of advertising. 
One of them is to “tell them” and the other 
is to “have them tell themselves.” From evi- 
dence in psychotherapy, and from evidence 
presented in the October 1952 issue of this 
magazine [Advertising Agency|, it appears 
that when credibility is lacking on the part of 


the person, the latter (inferred) method is 
more effective. 

Let’s look at the record of some beer ad- 
vertisers who use, in greater or lesser degrees, 
the inferred technique. Beer is taken as an 
example because beer is almost universally 
used, more money is spent for it each year 
than for milk or cigarettes. Also, the re- 
sponse of consumers to beer advertising is 
quicker and greater than for many other mass 
distributed and used products. From 1949 
through 1951 there was a four per cent de- 
crease in the amount of beer consumed. 
However, the leader, Schlitz, gained about 20 
per cent. Rheingold had close to a 50 per 
cent gain, and Miller’s almost doubled. While 
there may have been other factors operating, 
it’s hard to deny credit to the type of adver- 
tising these beer companies have been using. 
As a matter of fact, if you look at the top 
brand in any product group, you will usually 
find that they have used the non-directive or 
inferred approach. 

If such an approach is so good, why don’t 
more persons use it? Here are some possible 
reasons: 

1. Most advertisers (not agency personnel) 
cannot resist the temptation to “tell them”— 
to sell the product as an extroverted salesman 
would. This is a very easy pitfall into which 
to fall, and a hard one to leave. 

2. Not everyone is able to use the inferred 
(non-directive) technique. To a large extent 
it is dependent upon the personality of the 
creative persons. Some persons think and act 
in a non-directive manner. They are friendly, 
sincere, and understanding. Because of these 
personality characteristics, they are able to 
create advertising which is in keeping with 
their temperament. Other persons are of a 
different turn. They are better at employing 
stronger, more obvious and promotional meth- 
ods to get across a point. This is not in- 
tended as a criticism since there is certainly 
room in the advertising field for both. How- 
ever, don’t expect one type of person to turn 
out a different type of advertising. It is hard 
(and uneconomical) to “live a lie’ and the 
consuming public is quick to catch insincerity 
in advertising. 
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3. The direct method still sells goods. This 
is a potent argument. While there are many 
persons who are influenced by it, it seems as 
though an advertiser should use the inferred 
approach if he wishes to be really big. For 
this approach appears to be effective with the 
greatest number of people. 

To summarize the high points: 

1. There is a close paralle’ ‘setween con- 
cepts used in psychotherapy and advertising. 

2. The non-directive technique is quite com- 


Howard D. Hadley 


parable with the inferred technique in adver- 
tising as exemplified by Modess, Breck, John 
Hancock, Old Golds, and Seagram’s 7 Crown. 

3. Both techniques appear to be successful 
because the “patient” or “consumer” arrives 
at judgments by himself without being di- 
rectly told. 

4. Successful use of both methods is very 
dependent upon the personality of the per- 
sons involved—therapists or creative persons. 


Reprinted from Advertising Agency, April, 1953 


Reading: Stop Wasting Your Time 


Take a look at your mail. In addition to 
the usual flood of letters, ads, memos, it 
probably contains a couple of newspapers, a 
rash of magazines, an occasional book or two, 
and a shower of releases, pamphlets, broad- 
sides, etc. Management personnel spend 
about 15 hours a week just reading. And in 
many jobs you may well spend more. But 
how much of it is wasted? Tests show the 
average businessman reads only slightly better 
than an eighth-grade schoolboy—and that is 
still above the national level. Trouble is, few 
people have ever received any reading training 
after elementary school. More and more ex- 
ecutives are aware of this handicap, are turn- 
ing to reading development firms like The 
Reading Laboratory in New York and Chi- 
cago’s Foundation for Better Reading. These 
groups specialize in training executives to read 
faster, better. Goal of the 20-hour course is 
a reading speed of 650-700 words per minute. 
National average is about 250 words. Many 
taking the course do far better. 


Reading Laboratory Director K. P. Bald- 
ridge points out that reading speed will, of 
course, vary with the difficulty of the material. 
But: you can read even legal and scientific 
matter faster with proper training. 

Procedure starts off with an eyesight check, 
follows with photos of eye motion. The Lab 
also uses a battery of diagnostic tests to de- 
termine vocabulary level, reading speed and 
comprehension, and reading mechanics. Ex- 
perts stress that anyone’s reading can be im- 
proved. While professional guidance and 
special equipment is needed for difficult cases 
and major advancement, Reading Lab per- 
sonnel point out you can progress on your 
own. A good vocabulary is essential to read- 
ing skill. As a business executive your own 
is well above average now, but there is always 
room for improvement. There are several 
good systems now on the market if you feel 
yon need them. Tests show, incidentally, a 
high correlation between vocabulary level and 
general executive ability. (/ron Age, March 
5, 1953.) 





Book Reviews 


Arsenian, S., Ed. Jn Memoriam—Rudolf 
Pintner. Washington, D. C.: Gallaudet 
College Press, 1953. Pp. 1-63. Gratis. 
The idea of a memorial volume for Dr. 

Pintner originated with his many former stu- 

dents, colleagues, and friends and was carried 

through to completion by Seth Arsenian, Edi- 
tor and Chairman of the Pintner Memorial 

Committee. The volume contains a portrait 

of Pintner (1884-1942), a foreword by the 

Editor, a tribute prepared for the Faculty of 

Philosophy of Columbia University by H. L. 

Hollingworth shortly after Dr. Pintner’s un- 

timely death, and an annotated bibliography 

of Pintner’s contributions beginning in 1912. 

There are 182 annotations in all for the 30 

years or an average of six per year. 

Copies are being distributed to college and 
university libraries in the United States and 
psychologists who are interested in securing 
a copy may write to Gallaudet College. 

This is an appropriate type of memorial be- 
cause Dr. Pintner was an indefatigable worker 
and the quality and quantity of his research 
articies, books, and tests helped put Ameri- 
can psychology in a position of world leader- 
ship in the field of mental measurements. It 
is especially worthwhile to have this extensive 
annotated bibliography available in view of 
the fact that the ambitious plans of Murchi- 
son to publish bibliographies in successive 
editions of The Psychological Register were 
abandoned for financial reasons some twenty 
years ago. 

Donald G. Paterson 

The University of Minnesota 


Karn, H. W. and Gilmer, B. von H. Read- 
ings in industrial and business psychology. 
New York: McGraw-Hill, 1952. Pp. 476. 
$4.50. 

Blum, M. L. Readings in experimental in- 
dustrial psychology. New York: Prentice- 
Hall, 1952. Pp. 455. $4.75. 

Both these books were prepared primarily 
as supplementary texts for courses in indus- 
trial psychology. 

Readings in Industrial and Business Psy- 
chology consists of 53 selections, mostly re- 
cent journal articles, covering topics com- 


monly found in industrial psychology texts. 
Most of the articles are neither popular nor 
highly technical. The book would be ap- 
propriate for either graduate or undergrad- 
uate students. About half the articles report 
research studies and the others are discursive 
or theoretical. Titles for the eleven parts are: 
Motivation and Morale, Training in Indus- 
try, Analysis and Evaluation of Job Perform- 
ance, Psychological Tests, Interviewing and 
Counseling, Accidents and Safety, Fatigue 
and Worker Efficiency, Market Research, In- 
dustrial Leadership, Industrial Relations, and 
Psychologists in Industry. The editors have 
provided a three or four sentence summary 
preceding each article. 

Readings in Experimental Industrial Psy- 
chology has 63 recent journal articles. Nearly 
all the selections present results of research 
studies. About a third of the book is de- 
voted to common textbook topics: Employee 
Selection, Application Blanks, Training, Mo- 
tivation and Production, Labor Relations, and 
Music in Industry. Another third of the book 
is given to Engineering Problems, Display and 
Control Design Studies done for the Air Force, 
and Research in Visibility and Legibility. 
The remaining third of the book includes 
Marketing Research, and chapters on three 
new measurement techniques: the Flesch 
Formula, Forced Choice, and Critical Inci- 
dents. The editor has prepared a one or two 
page introduction for each of the 14 chapters 
in which he discusses, in a very readable man- 
ner, the importance of and main problems of 
each research area, and also summarizes the 
articles that have been selected for that sec- 
tion. 

How should one go about making a critical 
appraisal of a book of readings in industrial 
psychology? If we ask that the book include 
only articles that were important new contri- 
butions to the field when they first appeared 
then both books are weak. Articles are in- 
cluded in each book which present neither 
new ideas nor important research findings. 
If we expect to find only articles which are 
models of careful and thorough research again 
we will be disappointed in these books. Re- 
search studies are presented which are faulty 
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in design and which arrive at unjustified con- 
clusions. For instance, both books report 
validity studies in which item analysis is used 
without any cross validation of results. In 
none of these instances do the writers point 
out that their findings are what Cureton has 
rightly called “baloney.” We might ask 
whether the books give a realistic picture of 
present day industrial psychology. Both 
books fail to meet this requirement also. 
This is not the fault of the editors, how- 
ever, as they were limited by the available 
supply of articles. Many industrial psy- 
chologists do not publish their research at 
all. Research that results in so-called “nega- 
tive” results is frequently not reported and 
the cumulative effect of publishing only “posi- 
tive” findings is quite misleading. Those 
psychologists in the industrial field who do 
publish their work usually do not give an 
adequate account of the many practical dif- 
ficulties they have faced in carrying out 
worthwhile research and in clearly demon- 
strating the significance and value of their 
findings. 

The following criteria, however, seem more 
important to me as a basis for selecting 
articles for a book of readings in industrial 
psychology. 

1. The writer should have a worthwhile 
point to make and should do so clearly and 
briefly but at the same time adequately. 

2. The articles should cover as wide a 
variety of problems and approaches as possi- 
ble. There should be g minimum of over- 
lapping. ~ ; 

3. The articles as a gfoup should emphasize 
the use of scientific methodology in indus- 
trial psychology. 

4. The articles should be stimulating ma- 
terial for group discussion or individual cri- 
ticism. 

In general, both books meet these four re- 
quirements very well. Nearly all the articles 
are very clearly written and need little or no 
explanation by the editors. Blum has been 
especially successful in emphasizing the use 
of the scientific method. Karn and Gilmer 
provide an excellent sampling of the kinds of 
problems psychologists in industry have most 
frequently dealt with in recent years. Blum, 


Book Reviews 


on the other hand, gives considerable space 
to topics that psychologists in industry have 
not generally devoted their attention to up to 
now. Not many psychologists have been con- 
cerned with equipment design and “biome- 
chanics” for instance. But these are new and 
stimulating areas for research and represent 
fields in which psychologists may be able to 
make many important contributions in the 
future. Both books can easily be used to 
stimulate discussion and criticism. Experi- 
mental research which is reported clearly is 
always good for this purpose. Both of these 
books, in my opinion, will be found useful by 
many instructors for training students in the 
field of business ard industrial psychology. 
Philip H. Kriedt 
Prudential Insurance Company, 
Newark, N. J. 


Steiner, Lee R. A Sractical guide for troubled 
people. New York: Greenberg, 1952. Pp. 
299. $3.50. 

This book ‘“‘is intended for the individual, 
still in possession of his reasoning powers, 
but who, nevertheless, fee!s the need for some 
guidance with his life probiems . . . to en- 
able him to select the most adequate advisor 
for his particular woe.” The author’s earlier 
volume Where Do People Take Their Trou- 
bles? exposed the quack. Now Steiner ex- 
poses the professional consultant. The cases 
presented are not single individuals since the 
author says, “Both the seekers and the prac- 
titioners, as presented here, are composite 
characters.” The “composite character” is. 
of course, the standby of the fiction writer— 
not of the objective reporter writing for peo- 
ple who need correct information. 

Several professions are explored: psychia- 
try, psychosomatic medicine, psychoanalysis 
(medical and non-medical), psychology, social 
work, and ministry. There are chapters on 
books as aids to the cure of personal prob- 
lems, on good old-fashioned advice, and on 
solving one’s own problems. 

The author uses a standard pattern for ex- 
ploring each profession. There is some de- 
scription of the field and the training re- 
quired for practitioners. A case history or 
two shows that even in the professions 
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“quacks” exist and that mediocrity is some- 
times encountered. More cases are given to 
present the profession in a better light. Then, 
there are a few words suggesting how one can 
select a psychiatrist, a psychologist, or other 
professional consultant. 

This is a disappointing book for several 
reasons, two of which have been selected for 
comment. First, the audience is not kept 
clearly in mind. Why should intelligent peo- 
ple with troubles read page after page about 
the interprofessional tensions and the con- 
fusions of the professions which deal with 
problems of personality? Does knowledge 
that social workers, psychologists, and psy- 
chiatrists differ on who should do therapy 
really help the beautiful Mrs. Kimball who 
is bored at being socially successful and rich? 
There is too much misdirected, non-construc- 
tive, verbal finger-pointing, some of it further 
confused by professional jargon. 

The author assumes the reader will -under- 
stand terms like libidinal, censor, id, Freudian, 
Rankian, Jungian, logical construct, and non- 
direction. The intelligent reader who has not 


specialized in the behavior sciences is for- 


gotten. 

Second, the chapter on psychology has both 
errors of fact and dubious evaluations. For 
instance, is the following sentence generally 
true concerning psychologists a couple of dec- 
ades ago, or now? “Having thus decided to 
study pure science, they promptly concen- 
trated on rats and hamsters, never permitting 
the study of human animals to pollute their 
findings.” Did Thorndike, Terman, Allport, 
Thurstone, F. L. Wells, Rogers, and Lewin 
concentrate on rats? And, many people con- 
sider rat-men Hull and Tolman to have con- 
tributed mightily to psychotherapy and social 
psychology. Even though most psychologists 
would acknowledge some basis for Steiner’s 
comments, what contribution can a dozen 
pages of ridicule of academic psychology make 
to the troubled persons for whom the Guide 
is written? 

How many readers of this review would 
accept her statement that interpreting apti- 
tude tests is “often called occupationalogy 
(sic)”? In a small sample, this reviewer 


501 


found no specialist ‘in the field who ever had 
heard this term. Consider this statement: 
“Tf the counsellor is specializing in vocational 
guidance, most of the good employment agen- 
cies and school bureaus would have an ac- 
curate idea of his worth.” Does Steiner really 
believe that managers of employment agencies 
are especially competent to evaluate a psy- 
chologist’s work! She suggests writing to the 
NVGA “to check the caliber of any vocational 
guidance service.” But in this 1952 book she 
gives the New York address from which 
NVGA moved to Washington in 1949. It is 
hard to reconcile her view that the NVGA 
listing can guide one to a sound vocational 
guidance service with her derision of the de- 
scriptions of members in the APA directory 
and her failure to mention the ABEPP di- 
ploma. She ends the chapter thus: “Choose 
your counsellor with caution.” But “How?” 
seems to be a minimized feature in this “how 
to do it” book. 

Steiner’s purpose is laudable. It is un- 
fortunate that she has written for too many 
audiences and produced such a muddled book. 
Her writing at times seems to show a fine 
understanding of the problems of populariza- 
tion. But the lapses into mixed and unclear 
metaphors (“the sterile vision in which she 
now stews”), the use of professional jargon 
with its semblance of cleverness, the hanging 
of our multi-pieced, and considerably un- 
washed, interprofessional laundry on the pub- 
lic street, all lead the reviewer to conclude 
that this book will not serve its purpose. 

Many common-sense and correct sugges- 
tions for securing good professional help are 
in this book. E.g., work through your family 
doctor; go through a reputable social agency. 
However, they are enmeshed among too many 
peculiarly emphasized points which are ir- 
relevant to people who buy this book as a 
“guide.” The need still persists for a good 
pamphlet giving authentic suggestions to peo- 
ple who want help on problems of personal 
adjustment. In a few dozen pages one should 
be able to steer people away from quacks and 
toward reputable professional consultants. 

Harold. Seashore 


The Psychological Corporation, 
New York, N. Y. 








New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paterson 
Editor, Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota. 


Construction of educational 
Kenneth L. Bean. New York: McGraw-Hill Book 
Company, 1953. Pp. 231. $4.50. 

Short employment tests. George K. Bennett and 
Marjorie Gelink. New York: The Psychological 
Corporation, 1953. Pp. 10. 

A manual for the state-wide testing programs of 
Minnesota. Ralph F. Berdie, Wilbur L. Layton, 
and Theda Hagenah. Minneapolis: University of 
Minnesota Press, 1953. Pp. 86. $1.00. 

Effective use of older workers. Elizabeth Breckin 
ridge. Chicago: Wilcox & Follett Co., 1953. Pp 
224. $4.00. 

Company practices in marketing research. Richard 
D. Crisp. New York: American Management As 
sociation, 1953. Pp. 63. $2.50. 

The psychology of learning. James E. Deese. 
York: McGraw-Hill Book Company, 
384. $5.50. 

Personality and psychotherapy. John Dollard and 
Neal E. Miller. New York: McGraw-Hill Book 
Company, 1953. Pp. 483. $5.50. 

Business planning in a changing world. M. J 
Dooher, Editor. New York: American Manage 
ment Association, 1953. Pp. 51 

Making the most of your human resources. 
Dooher, Editor. New York: American 
ment Association, 1953. Pp. 76, $1.25 

Making personnel practices and programs pay off. 
M. J. Dooher, Editor. New York: American 
Management Association, 1953. Pp. 64. $1.25. 

Evaluating sales training needs and methods. M. J. 
Dooher, Editor. New York: American Manage- 
ment Association, 1953. Pp. 32. $1.25. 

Markets and marketing techniques. M. J. Dooher, 
Editor. New York: American Management Asso- 


and personnel tests. 


New 
1953. Pp. 


$1.25. 
M. J. 
Manage 


ciation, 1953. Pp. 47. $1.25. 

Research methods in the behavioral sciences. Leon 
Festinger and Daniel Katz. New York: The 
Dryden Press, Publishers, 1953. Pp. 660. $5.90. 


How to evaluate students. Henrietta Fleck. Bloom- 
ington, Illinois: McKnight & McKnight, 1953. Pp. 
85. $1. 

Measurement 
school. 


and evaluation in the elementary 
H. A. Greene, A. N. Jorgenson, and J. R 


Gerberich. New York: Longmans, Green and 
Co., Inc., 1953. Pp. 617. $5.00. 

Juvenile delinquency with the MMPI. Starke R 
Hathaway and Elio D. Monachesi. Minneapolis: 
University of Minnesota Press, 1953. Pp. 153. 
$3.50. 


The education of exceptional children. Arch O. Heck. 
New York: McGraw-Hill Book Company, 1953. 
Pp. 513. $6.00 

Measurement in education. A. N. Jordan. New 
York: McGraw-Hill Book Company, 1953. Pp. 
533. $5.25. 

Practical guidance methods. 


Robert H. Knapp. 


New York: McGraw-Hill Book Company, 1953. 
Pp. 320. $4.25. 


Age and achievement. Harvey C. Lehman. Prince 
ton: Princeton University Press, 1953. Pp. 358 
$7.50. 


Measuring educational achievement. W. J. Michaels 
and M. Ray Karnes. New York: McGraw-Hill 
Book Company, 1953. Pp. 496. $5.50. 

Satisfactions in the white-collar job. Nancy C 
Morse. Ann Arbor: University of Michigan, Sur- 
vey Research Center, Institute for Social Research, 
1953. Pp. 235. $3.50. 

The influence,.of instructional sets on Minnesota 
teacher attitude inventory scores. William Rabino- 
witz. New York: College of the City of New 
York, 1953. Pp. 19. 

Communication in management. 


Charles E. Red- 


‘field. Chicago: University of Chicago Press, 1953. 
Pp. 290. $3.75. 


The insight test. Helen D. Sargent. New York: 
Grune & Stratton, Inc., 1953. Pp. 276. $6.75. 

Industrial psychology. (3rd Ed.) Joseph Tiffin. 
New York: Prentice-Hall, Inc., 1953. Pp. 559 
$5.00. 

Profitably using the general staff position in busi- 
ness. Lyndall F. Urwick and Ernest Dale. New 
York: American Management Association, 1953. 
Pp. 35. $1.25. 

Motivation and morale in industry. Morris S. 
Viteles. New York: W. W. Norton & Company, 
Inc., 1953. Pp. 510. $9.50. 

Statistical inference. Helen M. Walker and Joseph 
Lev. New York: Henry Holt and Company, 1953. 
Pp. 510. $6.25. 

Indirect methods of attitude measurement. Irving R. 
Weschler and Raymond E. Bernberg. Los Angeles: 
University of California, 1953. Pp. 138. 

Management techniques for foremen. Richard W. 
Wetherwill. New London, Connecticut: National 
Foremen’s Institute, Inc., 1953. Pp. 177. $7.50. 

Community services for older people. Community 
Project for the Aged, Welfare Council of Chicago. 
Chicago: Wilcox & Follett Co., 1953. $4.00. 

Army personnel tests and measurement. Depart- 
ment of the Army. Washington, D. C.: United 
States Government Printing Office, 1953. Pp. 125. 
55 cents. 

Health and human relations. The Josiah Macy, Jr. 
Foundation. New York: The Blakiston Company, 
Inc., 1953. Pp. 270. $6.00. 

Group guidance of parents of mentally retarded chil- 
dren, 20 cents; Parents’ groups and the problem 
of mental retardation, 20 cents; Speaker’s manual, 
$1.50. New York: Association for the Help of 
Retarded Children. 

The Three R’s for the retarded. National Associa- 
tion for Retarded Children. SOcents. Order from 
Mrs. Emily Kucirek, 2904 Oberlin Avenue, Lorain, 
Ohio. 
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ETHICAL STANDARDS 
OF PSYCHOLOGISTS 


In September 1952 the Council of Representatives of 
the American Psychological Association adopted Ethica/ 
Standards of Psychologists as official policy of the Associ- 
ation. The standards, which are provisional, will be used 
for a three-year trial period. They will be revised, as 
necessary, and will be considered by the Council for final 
action in 1955. 


The Education and Training Board of the APA has 
recommended the use of Ethical Standards of Psychologists 
in graduate training programs. 


Also available is a smaller booklet, Ethical Standards 
of Psychologists, A Summary of Ethical Principles, which 
presents in brief the major tenets of the code. 


Prices: 


Ethical Standards of Psychologists, 186 pages, $1.00. 
Discounts for quantity orders. 


Ethical Standards of Psychologists, A Summary of 
Ethical Principles, 18 pages, 10¢. Quantity orders of the 
Summary: 10 copies, 75¢; 25 copies, $1.50 


Order from 


American Psychological Association 
1333 Sixteenth Street N. W. 
Washington 6, D. C. 




















INDUSTRIAL PSYCHOLOGY, 3rd Edition 
By JOSEPH TIFFIN, Professor of Industrial Psychology, Purdue University 


Shows how psychology is used by industry in personnel selection, operator and 
supervisory training, attitude measurement, employee counseling, merit rating 
and other related areas. A “how-to-do-it” text based almost entirely on actual 
work done in real plants with real people. 


Adds new topics: patterned interview; forced choice and paired comparison 
merit rating systems; visual requirements of job families; General Motor’s 
“My Job Contest” approach to attitude measurement; clarification of differ- 
ence between descriptive and sampling statistics. 








Expands treatment of such topics as: personnel data in relation to job success ; 
criteria of job success; factors related to accident frequency and severity ; 
application of the Taylor-Russell Tables. 
640 pages . 5%" 28%" . Published 1952 
Work book—2nd Edition . 88 pages . 8%" rll” . Published 1952 


_ Answers available on adoption (Restricted) . Objective Tests free on adoption (from 
author) 


READINGS IN EXPERIMENTAL INDUSTRIAL PSYCHOLOGY 


Edited by MILTON L. BLUM, Assoc. Professor of Psychology and Sub- 


Chairman, Dept. of Psychology, School of Civic and Business Admin., City 
College of New York 


e This collection of articles shows how psychologists use a variety of techniques 
to obtain objective data on numerous problems concerned with man and his 
work. The student practically looks over the shoulders of the investigators 
as they apply the experimental method to observed facts and draw their con- 
clusions. Covers 5 areas of industrial psychology: Personnel Problems— 
Human Relations—Engineering Psychology—The Consumer and Advertising 
—Newer Concepts. 


459 pages . 6’ 29" . November 1952 
Typical Adoptions (1952, 1953) 
JOHNS HOPKINS UNIVERSITY UNIV.OF NEW HAMPSHIRE BOSTON UNIVERSITY 
UNIVERSITY OF MINNESOTA UNIVERSITY OF ARKANSAS TUFTS COLLEGE 
UNIV. OF SOUTHERN CALIF. UNIVERSITY OF HOUSTON ILLINOIS INST. OF 
TECH. 


®@ e 
For approval copies unrile 
| PRENTICE-HALL, Inc. 70 FIFTH AVENUE, NEW YORK 11. N-Y 
































